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PREFACE  AND  SUMMARY 


This  report  Is  a  part  of  a  continuing  Rand  research  effort  in  the 
general  area  of  mathematical  programming.  Increasingly,  the  practical 
problems  that  very  large  organizations  confront  are  highly  structured, 
with  many  decision  variables  and  constraints.  In  the  Air  Force,  prob¬ 
lems  of  long-term  program  planning  and  allocating  scarce  resources  are 
becoming  more  complex.  The  obvious  importance  of  such  problems,  and 
the  intriguing  mathematical  possibilities  for  solving  them,  have  led 
to  a  voluminous  technical  literature.  Unfortunately,  little  has  been 
done  to  distill  and  unify  the  essential  concepts  found  In  this  litera¬ 
ture,  with  the  result  that  the  technical  development  of  the  field  and 
its  practical  application  have  been  retarded. 

The  aim  of  this  study  is  to  identify  and  develop  the  concepts 
central  to  the  optimization  of  large  structural  systems,  and  to  attempt 
an  organization  of  the  literature  around  these  concepts.  It  is  hoped 
that  nonspecialists  will  find  the  study  a  coherent  Introduction  to 
large-scale  optimization,  and  that  the  specialist  will  find  it  a  source 
of  new  insights  and  unifying  concepts. 

The  author  carried  out  this  work  as  a  consultant  to  The  Rand  Corp¬ 
oration  and  also  under  the  auspices  of  a  Ford  Foundation  Faculty  Re¬ 
search  Fellowship  and  National  Science  Foundation  Grant  GP-8740. 

An  earlinr  version  of  this  study  was  published  as  Working  Paper 
144  by  the  Western  Management  Science  Institute  of  the  University  of 
California  at  Los  Angeles  and  has  been  used  there  and  at  Stanford 
University  as  a  supplementary  text.  It  is  being  published  in  this 
form  to  make  it  readily  available  to  the  Air  Force  and  other  users. 
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1.  INTRODUCTION 


It  is  widely  held  that  the  development  of  efficient  optimization 
techniques  for  large  structured  mathematical  programs  is  of  great  im¬ 
portance  in  economic  planning,  engineering,  and  management  science.  /. 
mere  glance  at  the  bibliography  of  this  paper  will  reveal  the  enormous 
effort  devoted  to  the  subject  in  recent  years.  The  purpose  of  this 
paper  is  tc  suggest  a  unifying  framework  to  help  both  the  specialist 
and  nonspecialist  cope  with  this  vast  and  rapidly  growing  body  of 
knowledge. 

The  framework  is  based  on  a  relative  handful  of  fundamental  con¬ 
cepts.  They  can  be  classified  into  two  groups:  problem  manipulations 
and  solution  strategies.  Problem  manipulations  are  devices  for  restating 
a  given  problem  in  an  alternative  form  that  is  apt  to  be  more  amenable 
to  solution.  The  result  is  often  what  is  referred  to  in  the  litera¬ 
ture  as  a  "master''  prohlec.  Dualization  of  a  linear  program  is  one 
familiar  example  of  such  a  device.  Section  2  discusses  three  others: 
Projection,  Inner  Linearization,  and  Outer  Linearization.  Solution 
strategies,  on  the  other  hand,  reduce  an  optimization  problem  to  a 
related  sequence  of  simpler  optimization  problems.  This  often  leads 
to  "subproblems"  amenable  to  solution  by  specialized  methods.  The 
Feasible  Directions  strategy  is  a  well-known  example,  and  Sec.  3  dis¬ 
cusses  the  Piecewise,  Restriction,  and  Relaxation  strategies.  The 
reader  is  probably  already  familiar  with  special  cases  of  most  of 
these  concepts,  if  not  with  the  names  used  for  them  here;  the  new 
terminology  is  introduced  to  emphasize  the  generality  of  the  ideas 


involved. 


By  assembling  these  and  a  few  other  problem  manipulations  and 
solution  strategies  In  various  patterns,  one  can  rederlve  the  essential 
aspects  of  most  known  large-scale  programming  algorithms  (and  even 
design  new  ones).  Section  4  illustrates  this  for  Benders  Decomposition, 
Dantzlg-Wolfe  Decomposition,  Rosen's  Primal  Partition  Programming  meth¬ 
od,  Takahsshi's  "local"  approach,  and  a  procedure  recently  devised  by 
the  author  for  nonlinear  decomposition. 

Although  much  of  the  presentation  is  elementary,  for  full  appre¬ 
ciation  the  reader  will  find  it  necessary  to  have  a  working  knowledge 
of  the  theory  and  computational  methods  of  linear  and  nonlinear  pro¬ 
gramming  about  at  the  level  of  a  first  graduate  course  in  each  subject. 

1.1  TYPES  OF  LARGE-SCALE  PROBLEMS 

It  ia  important  to  realize  that  size  alone  is  not  the  distinguishing 
attribute  of  the  field  of  "large-scale  programming,"  but  rather  size 
in  conjunction  with  structure.  Large-scale  programs  almost  always 
have  distinctive  and  pervasive  structure  beyond  the  usual  convexity 
or  linearity  properties.  The  principal  focus  of  large-scale  programming 
is  the  exploitation  of  various  special  structures  for  theoretical  and 
computational  purposes. 

There  are,  of  course,  many  possible  types  of  structure.  Among 
the  commonest  and  most  important  general  types  are  these:  multidivi¬ 
sional,  combinatorial,  dynamic,  and  stochastic.  Multidivisional 
problems  consist  of  a  collection  of  Interrelated  "subsystems"  to  be 
optimised. +  The  subsystems  can  be,  tor  example,  modules  of 

-'See,  e.g.,  Aokl  68,  Bradley  67,  Gould  59,  Haas  68,  Kornsl  and 
Llptak  55,  Lasdon  and  Schoeffler  66,  Malinvaud  67,  Manne  and  Markowitz 
63,  Parish  and  Shephard  67,  Rosen  and  Ornea  63,  Tcheng  66. 


an  engineering  system,  reservoirs  in  a  water  resources  system,  depart¬ 
ments  or  divisions  of  an  organisation,  production  units  of  an  industry, 
or  sectors  of  an  economy.  Combinatorial  problems  typically  have  a 
large  number  of  variables  because  of  the  numerous  possibilities  for 
selecting  routes,  machine  setups,  schedules,  etc.*  Problems  with 
dynamic  aspects  are  large  because  of  the  need  to  replicate  constraints 
and  variables  to  account  for  several  time  periods.  And  problems 
with  stochastic  or  uncertainty  aspects  are  often  larger  than  they  would 
otherwise  be  in  order  to  account  for  alternative  possible  realisations 
of  imperfectly  known  entities. ***  A  method  that  successfully  exploits 
one  specific  structure  can  usually  be  adapted  to  exploit  other  specific 
structures  of  the  same  general  type.  Perhaps  needless  to  say, 
problems  are  not  Infrequently  encountered  which  fall  simultaneously 
into  two  or  more  of  these  general  categories. 

The  presence  of  a  large  number  of  variables  or  constraints  can 
be  due  not  only  to  the  intrinsic  nature  of  a  problem  as  suggested 
above,  but  also  to  the  chosen  representation  of  the  problem.  Some¬ 
times  a  problem  with  a  few  nonlinearities,  for  example,  is  expressed 
as  a  completely  linear  program  by  means  of  plecevise-llnear  or  tangen¬ 
tial  linear  approximation  to  the  nonlinear  functions  or  sets  (cf. 

f  ~ 

See,  e.g.,  Dantzlg  60,  Dantzlg,  Blattner  and  Rao  67,  Dantzlg, 
Fulkerson  and  Johnson  54,  Dantzlg  and  Johnson  64,  Ford  and  Fulkerson 
58,  Gilmore  uid  Gomory  61,  63,  and  65,  Glassey  66,  Midler  and  Wollmer 
68,  Rao  and  Zionts  68,  Appelgren  69. 

t+See,  e.g.,  Charnes  and  Cooper  55,  Dantzlg  55b,  59,  Dzielinski 
and  Gomory  65,  Glassey  68,  Rao  68,  Robert  63,  Rosen  67,  Van  Slyke  and 
Wets  66,  Wagner  57,  Wilson  66. 

See,  e.g.,  Dantzlg  and  Madansky  61,  El  Agizy  67,  Van  Slyke  and 
Wets  66,  Wolfe  and  Dantzlg  62. 


— iwwam**'  idosssasKKSBBSjf 


-4- 


Secs.  2.2,  2.3).  Such  approximations  usually  greatly  enlarge  the  size 
of  the  problem.^ 

1.2  SCOPE  OF  DISCUSSION  AND  THE  LITERATURE 

The  literature  on  the  computational  aspects  of  large-scale  mathe¬ 
matical  programming  can  be  roughly  dichotomized  as  follows: 

I.  Work  aimed  at  improving  the  computational 
efficiency  of  a  known  solution  technique 
(typically  the  Simplex  Method)  for  special 
types  of  problems. 

II.  Work  aimed  at  developing  fundamentally 
new  solution  techniques. 

The  highly  specialized  nature  of  the  category  I  literature  and  the 
availability  of  several  excellent  surveys  thereon  leave  little  choice 
but  to  focus  this  paper  primarily  on  category  II.  Fortunately  this 
emphasis  would  be  appropriate  anyway,  since  category  II  is  far  more 
amorphous  and  in  need  of  clarification. 

Category  I 

The  predominant  context  for  category  I  contributions  is  the 
Simplex  Method  for  linear  programming .  The  objective  is  to  find,  for 
various  special  classes  of  problems,  ways  of  performing  each  Simplex 
iteration  in  less  time  or  using  less  primary  storage,  This  work  is 
in  the  tradition  of  the  early  and  successful  specialization  of  the 
Simplex  Method  for  transportation  problems  and  problems  with  upper- 
bounded  variables.  The  two  main  approaches  may  be  called  inverse 
comp aatifi cation  and  mechanized  pricing. 

See,  e.g.,  Chames  and  Lemke  54,  Gomory  and  Hu  62,  Kelley  60. 


Inverse  compact! fication  schemes  Involve  maintaining  the  basis  2 

inverse  matrix  or  an  operationally  sufficient  substitute  in  a  more 
advantageous  form  than  the  explicit  one.  One  of  the  earliest  and  most 
significant  examples  is  the  "product  form"  of  the  inverse  (Dantzlg  and 
Orchard-Hays  54],  which  takes  advantage  of  the  sparseness  of  most  large 
matrices  arising  in  application.  Other  schemes  involve  triangular 
factorization,  partitioning,  or  use  of  a  "working  basis"  that  Is  more 
tractable  than  the  true  one.  See  part  A  of  Table  1.  A  survey  of  many 
such  contributions  is  found  in  Sec.  II  of  (Dantzlg  68].  The  interested 
reader  should  also  consult  (Willoughby  69]  which,  in  the  course  of  t 

collecting  a  number  of  recent  advances  in  the  methods  of  dealing  with  !, 

sparse  matrices,  points  out  much  pertinent  work  done  in  special  appli¬ 
cation  areas  such  as  engineering  structures,  electrical  networks,  and 
electric  power  systems.  Well  over  a  hundred  references  are  given. 


Table  1 

SOME  ,..’OFK  AIMED  AT  IMPROVING  THE  EFFICIENCY  OF  THE 
SIMPLEX  METHOD  FOR  LARGE-SCALE  PROBLEMS 

A.  Inverse  Csspactlfication 

Dan trig  and  Orchard-Hays  54;  Jantzig  55a,  55b,  63t;  Markowitz  57; 

Dantzlg,  Harvey,  and  McKnight  64;  Heestersan  and  Sendee  65;  Kaul 

65;  Bakes  66;  Bennett  66;  Bennett  and  Green  66;  Saigal  66;  Dantzlg 

and  Van  Slyke  67;  Sakarovltch  and  Saigal  67;  Grigorladls  69;  Willoughby  69. 

B.  Mechanized  Pricing8 

Ford  and  Fulkerson  58;  Dantzlg  60;  Gilmore  and  Gomory  61,  63*  65; 

Dantzlg  and  Johnson  64;  Bradle;’  65,  Sec.  3;  Glassey  66;  Tomlin  66; 

Dantzlg,  Blattner  and  Rao  67;  flmaghraby  68;  Lasdon  and  Mackey  68; 

Rao  68,  Sec.  II;  Rao  and  Zionts  68;  Graves,  Hatfield  and  Whinston  69; 

Fox  69a. 


aMo8t  of  the  references  in  part  C  of  Table  2  also  use  mechanized 
pricing. 

^Discussed  in  Sec.  3.2. 
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Mechanised  pricing,  sometimes  called  column  generation ,  Involves  ! 

the  use  of  a  subsidiary  optimization  algorithm  Instead  of  direct 
enuaeratlon  to  find  the  best  nonbasic  variable  to  enter  the  basis 
when  there  are  many  variables.'  The  first  contribution  of  this 
sort  was  [Ford  and  Fulkerson  58] ,  in  which  columns  were  generated  by 
a  network  flow  algorithm.  Subsequent  authors  have  proposed  generating 
columns  by  other  network  algorithms,  dynamic  programming.  Integer  pro¬ 
gramming,  and  even  by  linear  programming  Itself.  See  part  3  of  Table 
1.  Excellent  surveys  of  such  contributions  are  [Ballnskl  64]  and 
[Gomory  63]. 

Category  l  contributions  of  comparable  sophistication  are  rela¬ 
tively  rare  in  the  literature  on  nonlinear  problems.  It  has  long  been 
recognised  that  it  is  essential  to  take  advantage  of  the  recursive 
nature  of  most  of  the  computations;  that  is,  one  should  obtain  the  data 
required  at  each  iteration  by  economically  updating  the  data  available 
from  the  previous  iteration,  rather  than  by  operating  each  time  on  the 
original  problem  data.  In  Rosen's  gradient  projection  algorithm,  for 
example,  the  required  projection  matrix  is  updated  at  each  iteration 
rather  than  computed  ab  initio.  This  is  quite  different,  however,  from 
"compacting"  the  projection  matrix  for  a  particular  problem  structure, 
or  "mechanising"  the  search  for  the  most  negative  multiplier  by  means 
of  a  subsidiary  optimization  algorithm.  Little  has  bean  published 
along  these  lines  (see,  however,  p.  153  ff.  and  Sec.  8.3  of  [Flacco 

*It  is  also  possible  to  mechanise  the  search  for  the  exiting  ba¬ 
sic  variable  when  there  are  many  constraints  (e.g.,  Gomory  and  Hu  62, 

Sec.  4)  or  when  what  amounts  to  the  Dual  Method  is  used  (e.g..  Sec.  3 
of  Gomory  and  Hu  62,  Abadle  and  Williams  63,  Vhinston  64,  and  part  A 
of  Table  2). 
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and  McCormick  68]).  Of  course,  many  nonlinear  algorithms  involve  a 
sequence  of  derived  linear  programs  and  therefore  can  benefit  from  the 
techniques  of  large-scale  linear  programming. 

Category  II 

We  turn  now  to  work  aimed  at  developing  new  solution  techniques 
for  various  problem  structures — the  portion  of  the  literature  to  which 
our  framework  of  fundamental  concepts  is  primarily  addressed. 

As  mentioned  above,  the  fundamental  concepts  are  of  two  kinds: 
problem  manipulations  and  solution  strategies.  The  key  problem  manip¬ 
ulations  (Sec.  2)  are  Dualization,  Projection,  Inner  Linearization, 
and  Outer  Linearization,  while  the  key  solution  strategies  (Sec.  3) 
are  Feasible  Directions,  Piecewise,  Restriction,  and  Relaxation.  These 
building  block  concepts  can  be  used  to  reconstruct  many  of  the  existing 
computational  proposals.  Using  Projection  followed  by  Outer  Lin¬ 
earization  and  Relaxation,  for  example,  we  can  obtain  Benders'  Parti¬ 
tioning  Procedure.  Rosen's  Primal  Partition  Programming  algorithm  can 
be  obtained  by  applying  Projection  and  then  the  Piecewise  strategy. 
Dantzlg-Wolfe  Decomposition  employs  Inner  Linearization  and  Restriction. 
Similarly,  many  other  existing  computational  proposals  for  large- 
scale  programming  can  be  formulated  as  particular  patterns  of  problem 
manipulations  and  solution  strategies  applied  to  a  particular  structure. 

See  Table  2  for  a  classification  of  much  of  the  literature  of 
category  II  in  terms  of  such  patterns.  One  key  or  representative 
paper  from  each  pattern  is  underlined  to  signify  that  it  is  discussed 
in  some  detail  in  Sec.  4.  Familiarity  with  one  such  paper  from  each 


pattern  should,  enable  the  reader  to  assimilate  the  other  papers,  given 
an  understanding  of  the  fundamental  concepts  at  tne  level  of  Secs.  2 
and  3. 


Table  2 

CLASSIFICATION  OF  SOME  REFERENCES  BY  PATTERN: 

PROBLEM  MANIPULATION (S) /SOLUTION  STRATEGY 

A.  Projection,  Outer  Linearization/Relaxation 

Benders  62;  Bal inski  and  Wolfe  63;  Gomory  and  Hu  64, 
pp.  351-354;  Buzby,  Stone  and  Taylor  65;  Van  Slyke  and 
Wets  66,  Sec.  2;  Weitzman  67;  Geoff rion  68b,  Sec.  3. 

B.  Projection/Plccewlae 

Rosen  63,  64;  Rosen  and  Ornea  63;  Beale  63;  Gass  66; 

Varalya  66;  Chandy  68;  Geoff rion  68b,  Sec.  5; 

Grigorladis  and  Walker  68. 

C.  Inner  Linearization/Restriction 

Dantzlg  and  Wolfe  60;  Dantzig  and  Madansky  61,  p.  175; 
Williams  62;  Wolfe  and  Dantzlg  62;  Dantzlg  63a,  Ch. 

24;  Bauaol  and  Fabian  64;  Bradley  65,  S«c.  2; 

Dzielinski  and  Gomory  65;  Madge  65;  Tcheng  66;  Tomlin 
66;  Whlnston  66;  Mallnvaud  67,  Sec.  V;  Parikh  and 
Shephard  67;  Elmaghraby  68;  Hass  68;  Rao  68,  Sec. 

Ill;  Robers  and  Ben-Israel  68;  Appelgren  69. 

D.  Projection/Feaslble  Directions 

Zschau  67;  Abadie  and  Sakarovltch  67;  Geoff rion  68b, 

Sec.  4;  Silverman  68;  Grinold  69,  Secs.  IV  and  V. 

E.  Duallzation/Feasible  Directions 

Uzava  58;  Takahashi  64,  "local"  approach;  Lasdon  64, 

68;  Falk  65,  6);  Golshtein  66;  I^earron  66;  Wilson  66; 

Bradley  67  (Sec.  3.2),  68  (Sec.  4);  Grinold  69,  Sec. 

III. 

Table  2  does  not  pretend  to  embrace  the  whole  literature  of  cat¬ 
egory  II.  There  undoubtedly  are  other  papers  that  can  naturally  be 
viewed  in  terms  of  the  five  patterns  of  Tsble  2,  and  there  certainly 
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are  papers  employing  ocher  patterns.*  SscMnni  2  and  3  mention  ocher 
papers  chat  can  be  viewed  naturally  in  terms  of  one  of  the  problem 
manipulations  or  solution  strategists  discussed  there.  Still  other 
contributions  seem  to  employ  manipulations  or  strategies  other  than 
(and  sometimes  along  with)  those  identified  here;  regrettably,  this 
interesting  work  does  not  fall  entirely  within  the  scope  of  this  effort. 

Another  group  of  papers  not  dealt  with  in  the  present  study  are 
those  dealing  with  an  infinite  number  of  variables  or  constraints, 
although  a  number  of  contributions  along  these  lines  have  been  made, 
particularly  in  the  linear  case — see,  e.g.,  (Charnes,  Cooper  and 
Kortanek  691,  [Hopkins  69].  Nor  do  we  consider  the  literature  on 
mathematical  programs  in  continuous  time  (a  recent  contribution  with 
a  good  bibliography  is  [Grinold  68]),  or  literature  on  the  Interface 
between  mathematical  programming  and  optimal  control  theory  (e.g., 
[Dantzlg  66],  [Rosen  67],  (Van  Slyke  68]). 

1.3  NOTATION 

Although  the  notation  employed  is  not  at  odds  with  customary 
usage,  the  reader  should  keep  a  few  conventions  in  mind. 

Lowercase  letters  are  used  for  scalars,  scalar-valued  functions, 
and  vectors  of  variables  or  constants.  Except  for  gradients  (e.g.. 


*E.g.:  Inner  Linearisation/Relaxation :  Abadie  and  Williams  63, 
Whins ton  64. 

Dual! rat ion.  Outer  Linearization/Relaxation:  Takahashl 
64  ("global"  approach),  Geoffrion  68b  (Sec.  6),  Fox  69b. 
Inner  Linearization,  Projection,  Outer  Linearization/ 
Relaxation:  Metz,  Howard  and  Williamson  66. 
Dualizatlon/Relaxation:  Webber  and  White  68. 
tt 

E.g.:  Balas  63  and  66,  Ball  66,  Charnes  and  Cooper  55,  Gomory 
and  Ku  62  (Secs.  1  and  2),  Kornal  and  Llptak  65,  Krona j 8  68,  Orchard- 
Hays  68  (Ch.  12),  Rech  66,  Ritter  67b. 


Vf(x)  »  >  •••*  sfr) ) »  *11  vectors  lire  column  vectors  unless 

transposed.  Capital  letters  are  used  for  matrices  (A,  B,  etc.),  sets 

(X,  Y,  etc.)  and  vector-valued  functions  (e.g.,  G(x)  -  [g^Cx),  ...» 

g  <*)]*).  The  dimension  of  a  matrix  or  vector-valued  function  is  left 
m 

unspecified  when  it  is  immaterial  to  the  discussion  or  obvious  from 
context.  The  dimension  of  x,  however,  will  always  be  n.  The  symbol 
"5"  is  used  for  vector  inequalities,  and  for  scalar  inequalities. 
"A"  means  "equal  by  definition  to."  The  notation  e.t. ,  used  in  stating 
a  constrained  optimization  problem,  means  "subject  to."  Convex  poly¬ 
tope  refers  to  the  solution  set  of  a  finite  system  of  linear  equations 
or  Inequations:  it  need  not  be  a  bounded  set. 
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2.  PROBLEM  MANIPULATIONS :  SOURCE  OF  "HASTER"  PROBLEMS 

A  problem  manipulation  Is  defined  to  be  the  restatement  of  a  given 

problem  in  an  alternative  form  that  is  essentially  equivalent  but  more 

amenable  to  solution.  Nearly  all  of  the  so-called  master  problems  found 

in  the  large-scale  programming  literature  are  obtained  in  this  way. 

A  very  simple  example  of  a  problem  manipulation  is  the  introduction 

of  slack  variables  in  linear  programming  to  convert  linear  Inequality 

constraints  into  linear  equalities.  Another  is  the  restatement  of  a 

totally  separable  problem  like  (here  x^  may  be  a  vector) 

k. 

Minimise  fi*Xi*  s,t*  Gi/Xi*  ^  °»  i*lf...»k 

xl’****xk  isl 

as  k  independent  problems,  each  of  the  form 

Minimize  ^(x^)  s.t.  G^x^)  «  0. 

Xi 

This  manipulation  crops  up  frequently  in  large-scale  optimization,  and 
will  be  called  separation. 

These  examples,  although  mathematically  trivial,  do  Illustrate  the 
customary  purpose  of  problem  manipulation:  to  permit  existing  optimiza¬ 
tion  algorithms  to  be  applied  where  they  otherwise  could  not,  or  to  take 
advantage  in  some  way  of  the  special  structure  of  a  particular  problem. 
The  first  example  permits  the  classical  Simplex  Method,  which  deals 
directly  only  with  equality  constraints,  to  be  applied  to  linear  pro¬ 
grams  with  inequality  constraints.  The  second  example  enables  solving 
a  totally  separable  probxem  by  the  simultaneous  solution  of  smaller 


problems.  Even  if  the  smaller  problems  are  solved  sequentially  rather 
than  simultaneously,  a  net  advantage  is  still  probable  since  for  most 
solution  methods  the  amount  of  work  required  increases  much  faster  than 
linearly  with  problem  size. 

More  specifically,  the  three  main  objectives  of  problem 
manipulation  in  large-scale  programming  seem  to  be: 

(a)  to  isolate  familiar  special  structures  imbedded  in  a  given 
problem  (so  that  known  efficient  algorithms  appropriate  to 
these  structures  can  be  used); 

(b)  to  induce  linearity  in  a  partly  nonlinear  problem  via 
judicious  approximation  (so  that  the  powerful  linear 
programming  algorithms  can  be  used); 

(c)  to  induce  separation. 

We  shall  discuss  in  detail  three  potent  devices  frequently  used  in 
pursuit  of  these  objectives:  Projection.  Inner  Linearization,  and 
Outer  Linearization. 

Projection  (Sec.  2.1),  sometimes  known  as  "partitioning"  or 
"parameterization",  is  a  device  which  takes  advantage  in  certain  prob¬ 
lems  of  the  relative  simplicity  resulting  when  certain  variables  are 
temporarily  fixed  in  value.  In  [Benders  62]  it  is  used  for  objective 
(a)  above  to  isolate  the  linear  part  of  a  "semilinear"  program  (see  Sec. 
4.1),  while  in  [Rosen  64]  it  is  used  to  induce  separation  (see  Sec.  4.2). 

Inner  Linearization  (Sec.  2.2)  and  Outer  Linearization  (Sec.  2.3) 
are  devices  for  objective  (b)  long  used  in  nonlinear  programming . 

Inner  Linearization  goes  back  at  least  to  [Charnes  and  Lemke  54], 
in  which  a  convex  function  of  one  variable  is  approximated  by  a  piece- 
wise-linear  convex  function.  Outer  Linearization  involves  tangential 
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approximation  to  convex  functions  as  in  [Kelley  60]  (see  Sec.  3.3). 

Both  devices  have  important  uses  in  large-scale  programming.  Inner 
Linearization  is  the  primary  problem  manipulation  used  in  the  famous 
Dantzig-Wolfe  Decomposition  method  of  linear  and  nonlinear  program¬ 
ming  (Sec.  4.3).  One  Important  use  of  Outer  Linearization  is  as  a 
means  of  dealing  with  nonlinearities  introduced  by  Projection 
(Sec.  4,1). 

Perhaps  the  most  conspicuous  problem  manipulation  not  discussed 
here  is  Dualization.  Long  familiar  in  the  context  of  linear  programs, 
duallzation  of  nonlinear  programs^  is  especially  valuable  in  pursuit 
of  objectives  (a)  and  (c) .  This  significant  omission  is  made  because 
of  space  considerations,  and  also  to  keep  the  presentation  as  elemen¬ 
tary  as  possible.  One  algorithm  relying  on  nonlinear  dualization  is 
mentioned  in  Sec.  4.5;  see  else  part  E  of  Table  2  and  [Geoff rion  68b; 
Sec.  6.1}. 

Other  problem  manipulations  not  discussed  here,  mostly  quite  spe¬ 
cialized,  can  be  found  playing  conspicuous  roles  in  [Charnes  and  Cooper 
55],  [El  Agizy  67],  [Goroory  and  Hu  62],  [Weil  and  Kettler  6Sj . 

We  now  proceed  to  discuss  Projection  and  Inner  and  Outer  Lineari¬ 
zation.  Section  3  will  discuss  the  solution  strategies  that  can  be  ap¬ 
plied  subsequent  to  these  and  other  problem  manipulations.  The  distinc¬ 
tion  between  problem  manipulations  and  solution  strategies  is  that  the 
former  replaces  an  optimization  problem  by  one  that  is  essentially 
equivalent  to  it,  while  the  latter  replaces  a  problem  by  a  recursive 
sequence  of  related  but  much  simpler  optimization  problems. 

- 

See,  e.g.,  Rocknfellar  68,  Ceoffrion  69. 


2,1  PROJECTION 


The  problem 


(2.1)  Maximize  f(x,y)  s.t.  G(x,y)  1  0 

x«X 

y*Y 


Involves  optimization  over  the  joint  space  of  the  x  and  y  variables. 
We  define  its  projection  onto  the  space  of  the  y  variables  alone  as 


(2.2) 


Maximize  [~Sup  f(x,y)  s.t,  G(x,y)  ^  ol 

v  L  xeX.  J 

ycY 


The  maximand  of  (2.2)  is  the  entire  bracketed  quantity — call  it  v(y) — 
which  is  evaluated,  for  fixed  y,  as  the  supremal  value  of  an  "inner" 
maximization  problem  in  the  variables  x.  We  define  v(y)  to  be  -•  if 
the  inner  problem  is  infeasible.  The  only  constraint  on  y  in  (2.2) 
is  that  it  must  be  in  Y,  but  obviously  to  be  a  candidate  for  the 
optimal  solution  y  must  also  be  such  that  the  inner  problem  is  feasible, 
i.e.,  y  must  be  in  the  set 


(2.3)  V  £  {y:v(y)  >  -»}  *  (y:G(x,y)  >  0  for  some  xcX}. 

Thus  we  may  rewrite  (2.2)  as 


(2.4)  Maximise  v(y). 

y«Yrv 

The  set  V  can  be  thought  of  as  the  projection  of  the  constraints 
x«X  and  G(x,y)  Z  0  onto  the  space  of  the  y  variables  alone.  £t  is 
depicted  for  a  simple  case  in  Pig.  1;  X  is  an  interval,  the  set 
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X 


Fig.  l--Depiction  of  the  set  V 


{ (x,y) :G(x,y)  2:  0}  is  shaded,  and  the  resulting  V  is  an  interval. 

It  is  often  possible  to  obtain  a  snore  conventional  end  tractable 
representation  o  than  the  definitional  one.  See,  for  example, 
the  inequalities  w.5)  of  Sec.  4.1  (cf.  [Kohler  67 J) . 

The  relationship  between  the  original  problem  (2.1)  and  its 
projection  (2.4)  is  as  follows.^  The  proof  is  elementary. 

Theorem  1.  Problem  (2.1)  la  infeasible  or  has  unbounded 
value  if  and  only  if  the  same  is  true  of  (2.4).  If  (x°,  y°) 
is  optimal  in  (2.1),  then  y°  must  be  optimal  in  (2.4).  If 
y°  is  optimal  in  (2.4)  and  xr  achieves  the  supremum  of 
f(x,  y°)  subject  to  xeX  and  G(x,  y°)  £  0,  then  x°  together 
with  y°  is  optimal  in  (2.1). 

^Ono  may  read  (2.2)  for  (2.4)  in  Theorem  1,  except  that  (2.2)  can 
be  feasible  with  value  -*•  when  (2.1)  la  infeasible. 


It  should  be  emphasized  that  Projection  is  a  very  general 
manipulation--no  special  assumptions  on  X,  Y,  f,  or  G  are  required 
for  Th.  1  to  hold,  and  any  subset  of  variables  whatsoever  can  be 
designated  to  play  the  role  of  y.  When  convexity  assumptions  do  hold, 
however,  the  following  theorem  shows  that  (2.2)  is  a  concave  program. 

Theorem  2.  Assume  that  X  and  Y  are  convex  sets,  and  that  f  and 
each  component  of  G  are  concave  on  X  x  Y.  Then  v  is  concave 
on  Y. 

Proof.  Fix  y°,  y'eY  and  0  <  0  <  1  arbitrarily.  Let  6*1-6.  Then 
v(6y°  +  0y')  ■ 

Sup  f (0x°  +  0x' ,0y°  +  6y  * ) 

x°,x,eX 

s.t.  G(0x°  +  8x\  8y°  +  8y')  ■  0 

>  Sup  f(0x°  +  0x' ,  0y°  +  8y ’ )  s.t.  G(x°,y°)  -  0,G(x’,y’)  -  0 

x  ,x'eX 

>  Sup  0f(x°,y°)  +  8f(x',y’)  s.t.  G(x°,y°)  -  0,G(x’,y’)  -  0 

x°,x’£X 

-  8v(y°)  +  8v(y’), 

where  the  equality  or  inequality  relations  follow,  respectively,  from 
the  convexity  of  X,  the  concavity  of  G,  the  concavity  of  f,  and 
separability  in  x°  and  x'.  |j 


Since  V  is  easily  shown  to  be  a  convex  set  when  v  is  concave, 
it  follows  under  the  hypotheses  of  Theorem  2  that  (2.4)  is  also  a 
concave  program. 

Projection  is  likely  to  be  a  useful  manipulation  when  a  problem 
is  significantly  simplified  by  temporarily  fixing  the  values  of  certain 
variables.  In  [Benders  62],  (2.1)  is  a  linear  program  for  fixed  y 
(see  Sec.  4.1).  In  [Rosen  64],  (2.1)  is  a  separable  linear 
program  for  fixed  y  (see  Sec.  4.2).  See  Table  2  for  numerous  other 
instances  in  which  Projection  plays  an  important  role. 

It  is  interesting  to  note  that  Projection  cnn  be  applied  sequentially 
by  first  projecting  onto  a  subset  of  the  variables,  then  onto  a  subset 
of  these,  and  so  on.  The  result  is  a  dynamic-programming- like  reformula¬ 
tion  [Bellman  57],  [Dantzig  59,  p.  61  ff.],  [Nemhauser  64].  Many  dynamic 
programming  problems  can  fruitfully  be  viewed  in  terms  of  sequential 
projection,  and  conversely,  but  we  shall  not  pursue  this  matter 
here. 

It  may  seem  that  the  maximand  of  the  projected  problem  (2.2)  is 
excessively  burdensome  to  deal  with.  And  indeed  it  may  be,  but  the 
solution  strategies  of  Sec.  3  enable  many  applications  of  Projection 
to  be  accomplished  successfully.  Ti  e  key  strategies  seem  to  be 
Relaxation  preceded  by  Outer  Linearization  (cf.  Sec.  4.1),  the  Piece- 
wise  strategy  (cf.  Sec.  4.2),  and  Feasible  Directions  (cf.  Sec.  4.4). 

Of  course  if  y  is  only  one-dimensional,  (2.2)  can  be  solved  in  a 
parametric  fashion  [Joksch  64],  [Ritter  67a], 
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2.2  INNER  LINEARIZATION 

Inner  Linearization  is  an  approximation  applying  both  to  convex 
or  concave  functions  and  to  convex  sets.  It  is  conservative  in  that 
it  does  not  underestimate  (overestimate)  the  value  of  a  convex  (concave) 
function,  or  include  any  points  outside  of  an  approximated  convex  set. 

An  example  of  Inner  Linearization  applied  to  a  convex  set  X 
in  two  dimensions  is  given  in  Fig.  2,  where  X  has  been  approximated 
by  the  convex  hull  of  the  points  x\...,x^  lying  within  it.  X  has 
been  linearized  in  the  sense  that  the  approximating  set  is  a  convex 
polytope  (which,  of  course,  can  be  specified  by  a  finite  number  of 
linear  inequalities).  The  points  x^,,..,x^  are  called  the  base . 

The  accuracy  of  the  approximation  can  be  made  as  great  as  desired  by 
making  the  density  of  the  base  sufficiently  high. 


3*2 


Fig.  2--Inner  Linearisation  of  a  convex  set 
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An  exanple  of  Inner  Linearization  applied  to  a  function  of  one 
variable  is  given  in  Fig,  3,  where  the  function  f  has  been  approximated 
on  the  interval  [x^x^]  by  a  piecewise-linear  function  (represented 
by  the  dotted  line)  that  accomplishes  linear  interpolation  between 
the  values  of  f  at  the  base  points  x\...,x^.  The  approximation 
is  "inner"  in  the  sense  that  the  epigraph  of  the  approximating  function 
lies  entirely  within  the  epigraph  of  the  approximated  function.  (The 
epigraph  of  a  convex  (concave)  function  is  the  set  of  all  points  lying 
on  or  above  (below)  the  graph  of  the  function.) 


Fig.  3 — Inner  Linearisation  of  a  convex  function 


Let  us  further  examine  these  two  graphical  examples  of  Inner 
Linearisation  in  the  context  of  the  special  problem 

(2*5)  Minimise  f(x)  s.t.  G(x)  €  0, 

xeX 
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where  n*2,  X  is  a  convex  set,  and  all  functions  are  convex.  Inner- 
linearizing  X  as  in  Fig.  2  yields  the  approximation 


(2.6) 


Minimize 

a  "2  0 


f(£  a*xj) 


s.t. 


g(£  C^xj)  So,  £  c*  -  1. 

j=l 


Note  that  the  x  variables  are  replaced  by  the  "weighting"  variables 
o^,  one  for  each  chosen  base  point  in  X.  Inner-linearizing  f  now  as 
in  the  two-dimensional  analog  of  Fig.  3  yields 


(2.7) 


Minimize 

a  ~  0 


£  o'1  f(xj) 
j»l 


G(£  o-V)  =0,  £  cr*  «  l. 

J*l  j”1 


We  have  taken  the  bases  for  the  approximations  to  X  and  f  to  coincide, 

since  normally  only  one  base  is  introduced  for  a  given  problem.  An 

exception  to  this  general  rule  may  occur,  however,  when  some  of  the 

functions  are  separable,  for  then  it  may  be  desirable  to  introduce 

different  bases  for  different  subsets  of  variables.  Suppose,  for 

2 

example,  that  f(x)  *  f^(x^)  +  ,  X  *  R  ,  and  that  we  wish  to 

use  <  x^,...,Xj  >  as  a  base  for  inner- linearizing  f^  and  <  x^,...,*^  > 
as  a  base  for  f^.  Then  the  corresponding  approximation  to  (2.5)  would 
be 

4  6 

(2.8)  Minimize  £  oc?f Axh  +  £  o^f-(x^) 

v*l  1  1  1  j*l  1  L  l 

or,  sJ  0  J 


s.t.  G(£  oW,  £  o^x^)  «  0,  £  o'  =  1  and  £  *  1 . 

j*l  j-1  J*i  j»l 

Problems  (2.6),  (2.7)  and  (2.8)  are  ail  convex  programs. 
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The  general  nature  of  Inner  Linearization  should  be  clear  from 
these  examples  a  It  is  important  to  appreciate  that  there  is  a  great 
deal  jf  flexibility  in  applying  Inner  Linearization--both  as  to 
which  sets  and  functions  are  inner-linearized,  and  as  to  which  base 
is  used..  Inner-linearizing  everything  results,  of  course,  in  a  linear 
program,  although  it  is  by  no  means  necessary  to  inner- linearize  every¬ 
thing  (see  Sec.  4.3).  The  base  can  be  chosen  to  approximate  the  set 
of  points  satisfying  any  subset  whatever  of  the  given  constraints;  the 
constraints  in  the  selected  subset  are  replaced  by  the  simple  non¬ 
negativity  conditions  on  the  weighting  variables  plus  the  normalization 
constraint,  while  the  remaining  constraints  are  candidates  for  functional 
Inner  Linearization  with  respect  to  the  chosen  base.  Or,  if  desired,  the 
base  can  be  chosen  freely  from  the  whole  space  of  the  decision  variables 
(this  can  be  thought  of  as  corresponding  to  the  selection  of  an  empty 
set  of  constraints).  Each  of  the  given  constraints,  then,  is  placed 
into  one  of  three  categories,  any  of  which  may  be  empty:  the  constraints 
defining  the  convex  set  approximated  by  the  chosen  base,  those  that 
are  inner-linearized  over  the  base,  and  all  others. 

Inner  Linearization  has  long  been  used  for  convex  (or  concave) 

functions  of  a  single  variable  [Charnes  and  Lemke  54].  It  has  also 

been  used  for  non-convex  functions  of  a  single  variable  [Miller  63], 

Techniques  based  on  this  manipulation  are  sometimes  called  "separable 

programming"  methods  because  they  deal  with  functions  that  are  linearly 

n 

separable  into  functions  of  one  variable  (e.g.,  f(x)  A  £  f  (x  )). 

i-1  1  * 

It  is  easy  to  determine— perhaps  graphically- -an  explicit  base 
yielding  as  accurate  an  inner- linearization  as  desired  for  a  given 
function  of  one  variable.  Ic  is  much  more  difficult,  however,  to  do 
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this  for  functions  of  many  variables.  Even  if  a  satisfactory  base 
could  be  determined,  it  would  almost  certainly  contain  a  large  number 
of  points.  This  suggests  the  desirability  of  having  a  way  to  generate 
base  points  as  actually  needed  in  the  course  of  computationally  solving 
the  inner- linearized  problem.  Hopefully  it  should  be  necessary  to 
generate  only  a  small  portion  of  the  entire  base,  with  many  of  the 
generated  points  tending  to  cluster  about  the  true  optimal  solution. 
Indeed  there  is  a  way  to  do  this  based  on  the  solution  strategy  we  call 
Restriction  (Sec.  3.2).  The  net  effect  is  that  the  Inner  Linearization 
manipulation  need  only  be  done  implicitly!  Dantzig  and  Wolfe  were  the 
originators  of  this  exceedingly  clever  approach  to  nonlinear  programming 
[Dantzig  63a,  Ch.  24];  we  shall  review  this  development  in  Sec.  4.3. 

An  important  special  case  in  which  Inner  Linearization  can  be  us»d 
very  elegantly  concerns  convex  polytopes  (the  polytope  could  be  the 
epigraph  of  a  piecewise- linear  convex  function).  Inner  Linearization 

Introduces  no  error  at  all  in  this  case  if  the  base  is  taken  to  coin- 

t 

cide  with  the  extreme  points.  As  above,  the  extreme  points  can  be 
generated  as  needed  if  tne  implicitly  inner-linear ized  problem  is 
solved  by  Restriction.  This  is  the  ioea  behind  the  famous  Decomposition 
Principle  for  linear  programing  [Dantzig  and  Wolfe  6O3,  which  is 
reviewed  in  Sec.  4.3. 

For  ease  of  reference  in  the  sequel,  the  well-known  theorem 
asserting  the  exactness  of  Inner  Linearization  for  convex  polytopes 
[Goldman  56]  is  recorded. 

*It  is  also  necessary,  of  course,  to  introduce  the  extreme  rays 
if  the  poly tope  is  unbounded. 
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Theorem  3.  Any  nonempty  convex  polytope  X  £  {x  :  Ax  »  b]  can  be 
expressed  as  the  vector  sum  +  C  of  a  bounded  convex  polyhedron 
and  a  cone  C  =  {x:  Ax  =  0}.  jP  in  turn  can  be  expressed  as 
the  convex  hull  of  its  extreme  vectors  <  y^,...,yp  >,  and  C  can 
be  expressed  as  the  nonnegative  linear  combinations  of  a  finite 
set  of  spanning  vectors  <  (If  ./•’(respectively  C) 

consists  of  only  the  0-vector,  take  p  (respectively  q)  equal  to  0.) 

Thus  there  exist  vectors  <y^ . y^;  z^,  z^>  such  that  xeX  if 

and  only  if 

P  q 

X  =  X)  aiyi  +  £  Pizi 
i=l  i=l 


for  some  nonnegative  scalars  a^,...,ar  ,  gj,...,g  such  that 
P  P  q 

I  or  «  1.  Moreover,  if  the  rank  of  A  equals  n  (the  number 
i*l 

of  its  columns),  then  a  representation  with  a  minimal  number  of 
vectors  is  obtained  by  letting  the  v^'s  be  the  extreme  vectors 
of  X  and  by  letting  the  z^'s  be  distinct  nonzero  vectors  in  each 
of  the  extreme  rays  of  C;  this  minimal  representation  is  unique 
up  to  positive  multiples  of  the  z^'s. 


It  should  be  noted  that  in  mathematical  programming  the  rank  of 

A  usually  equals  n,  since  nonnegativity  constraints  on  the  variables 

are  usually  xicluded  in  X.  If  this  is  not  the  case,  then  X  can 

always  be  imbedded  ir«  the  nonnegative  orthant  of  R  by  a  simple 

linear  transformation  (viz.,  put  x.  ■  y,  -  y  ,  where  y  2  0,  i  -  0. 

1  I  o  i 


•  •  • 


n). 
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There  are  also  results  having  to  do  with  economical  inner  lineari¬ 
zations  of  nonpolyhedral  sets.  For  example,  there  is  the  Theorem  of 
Xrein  and  Milman  [Berge  63,  p.  167]  that  every  closed,  bounded,  non¬ 
empty  convex  set  is  the  convex  hull  of  its  extreme  points.  Usually, 
however,  it  suffices  to  know  that  a  convex  set  or  function  can  be 
represented  as  accurately  as  desired  by  Inner  Linearization  if  a  suf¬ 
ficiently  dense  base  is  chosen. 

2.3  OUTER  LINEARIZATION 

Outer  Linearization  is  complementary  in  nature  to  Inner  Lineariza¬ 
tion,  and  also  applies  both  to  convex  (or  concave)  functions  and  to 
convex  sets. 

An  example  as  applied  to  a  convex  set  in  two  dimensions  is  given 
by  Fig.  4,  where  X  has  been  approximated  by  a  containing  convex  poly¬ 
tope  that  is  the  intersection  of  the  containing  half-spaces  . 

The  first  three  are  actually  supporting  half-spaces  that  pass,  respec- 

12  3 

tively,  through  the  points  x  ,  x  ,  and  x  on  the  boundary  of  X. 


Fig.  4--0uter  Linearization  of  a  convex  set 
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An  example  as  applied  to  a  function  of  one  variable  is  given  in 
Fig.  5,  where  the  function  f  has  been  approximated  by  the  piecewise  - 
linear  function  that  is  the  upper  envelope,  or  pointwise  maximum,  of 
the  linear  supporting  functions  s^(x),  Sj(x)  associated  with  the 

points  x\  x^.  A  linear  support  for  a  convex  function  f  at  the 
point  x  is  defined  as  a  linear  function  with  the  property  that  it 
nowhere  exceeds  f  in  value,  and  equals  t  in  value  at  x.  The  epigraph 
of  the  approximating  function  contains  the  epigraph  of  the  approximated 
function  when  Outer  Linearization  is  used. 


/Ss(x) 


Obviously  Outer  Linearization  is  opposite  to  Inner  Linearization 
in  that  it  generally  underestimates  (overestimates)  the  value  of  a 


+If  f  is  differentiable  at  x,  then  f(x)  +  Vf(x)(x  -  x)  is  a 
linear  support  at  x, 
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convex  (concave)  function,  and  includes  net  only  the  given  convex  set 
but  points  outside  as  well.  The  notion  of  conjugacy  (see,  e.g., 
[Rockefeller  68])  is  a  logical  extension,  but  need  not  be  pursued  here. 

That  Outer  Linearization  truly  linearir is  a  convex  program  like 

(2.9)  Minimize  f(x)  s.t.  G(x)  ^  0 

xeX 

should  be  clear.  The  approximation  of  X  by  a  containing  convex  poiy- 
tope  can  only  introduce  linear  constraints;  the  approximation  of  gj, 
by  the  pointwise  maximum  of  a  collection  of  linear  supports,  say, 
obviously  leads  to  p^  linear  inequalities;  and  the  approximation  of  £ 
by  the  pointwise  maximum  of  p  linear  supports  leads  to  p  additional 
linear  inequalities  after  one  invokes  the  elementary  manipulation  of 
minimizing  an  upper  bound  on  f  in  place  of  f  itself.^  If  all  nonlinear 
functions  are  dealt  with  in  this  fashion,  the  approximation  to  (2.9) 
is  a  linear  program. 

As  with  Inner  Linearization,  there  is  great  latitude  concerning  which 

tt 

sets  and  functions  are  to  be  outer- linearized,  and  which  approximants 
are  to  be  used.  In  general,  the  objective  function  may  or  may  not  be 
outer-linearized,  and  each  constraint  is  placed  into  one  of  three 
categories:  the  ones  that  together  define  a  convex  set  to  be  outer- 

linearized,  the  ones  that  arc  outer-linearized  individually,  and  the 
ones  that  are  not  outer-linearized  at  all. 

*E.g.,  Min  Max.  {s,,(x)}  •  Min  o  s.t.  a  *  s,(x),  all  i  . 
xeX  xeX 

o 

t+For  the  sake  of  unified  terminology,  we  use  the  term  approzimant 
for  a  containing  or  supporting  hall-space  of  a  convex  set,  anu  also  for 
a  linear  bounding  function  or  linear  support  of  a  convex  function. 
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The  main  obstacle  faced  with  Outer  Linearization,  is  that  an 
excessive  number  of  approximants  may  be  required  for  an  adequate  approxi¬ 
mation,  especially  for  sets  in  more  than  two  dimensions  and  functions 
of  more  than  one  variable.  Fortunately,  it  turns  out  that  it  is  usually 
possible  to  circumvent  this  difficulty,  for  there  is  a  solution  strategy 
applicable  to  the  outer- linearized  problem  that  enables  approxiroants  to 
be  generated  economically  as  needed  without  having  to  specify  them  in 
advance.  We  call  this  strategy  Relaxation.  The  net  effect  is  that  the 
Outer  Linearization  manipulation  need  only  be  done  implicitly.  TWo  pio¬ 
neering  papers  on  this  approach  to  nonlinear  programming  are  [Kelley  60] 
and  [Dantzig  and  Madansky  61].  Relaxation  and  the  first  of  these  papers 
are  discussed  in  Sec.  3.3. 

In  large-scale  F'ogramming,  Outer  Linearization  is  especially  impor¬ 
tant  in  conjunction  with  Projection  and  Dualization.  See,  for  example, 
the  discussion  of  [Benders  62J  in  Sec.  4.1. 

Approximation  by  Outer  Linearization  naturally  raises  the  question 
of  the  existence  of  a  supporting  approximant  at  a  given  point.  The 
main  known  result  along  these  lines  is  that  every  boundary  point  of  a 
convex  set  in  Rn  must  have  at  least  one  supporting  half-space  passing 
through  it.  It  follows  that  every  closed  convex  set  can  be  represented 
as  the  intersection  of  its  supporting  half-spaces  [Berge  63,  p.  166]. f 
It  also  follows  that  every  convex  (or  concave)  function  with  a  closed 
epigraph  has  a  supporting  half-space  to  its  epigraph  at  every  point  where 
the  function  is  finite.  Unfortunately,  this  is  not  quite  the  same  as  the 
existence  of  a  linear  support  at  every  such  point,  since  the  supporting 


Of  course,  a  convex  polytope  by  definition  admits  an  exact  outer- 
linearization  using  only  a  finite  mnber  of  aoproximants. 
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half-space  may  be  "vertical"  when  viewed  as  in  Fig.  5.  Various  mild 
conditions  could  be  imposed  to  preclude  this  kind  of  exceptional 
behavior,  but  for  most  purposes  one  may  avoid  the  difficulty  by 
simply  working  directly  with  the  epigraph  of  a  convex  function. 


tf 
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3.  SOLUTION  STRATEGIES:  SOURCE  OF  "SUBPROELEMS" 


The  previous  section  described  several  prominent  problem  manipu¬ 
lations  for  restating  a  given  problem  in  a  more  or  less  equivalent  form. 
The  result  is  often  referred  to  in  specific  applications  as  a  "master’* 
problem.  Typically  one  then  applies  a  solution  strategy  designed 
to  facilitate  optimization  by  reduction  to  a  sequence  of  simpler  op¬ 
timization  problems.  Quite  often  this  leads  to  Bubproblema  amenable 
to  solution  by  specialized  algorithms.  There  are  perhaps  a  half 
dozen  principal  solution  strategies,  each  applicable  to  a  variety 
of  problems  and  implementable  in  a  variety  of  ways .  This  section  pre¬ 
sents  three  such  strategies  that  seem  to  be  especially  useful  for  large- 
scale  problems:  the  so-called  Piecewise,  Restriction  and  Relaxation 
strategies.  See  Tabic  2  for  a  classification  of  many  known  algorithms 
in  terms  ov  ♦’he  solution  strategy  they  can  be  viewed  as  using. 

The  Piecewise  strategy  is  appropriate  for  problems  that  are 
significantly  simpler  if  their  variables  are  temporarily  restricted  to 
certain  regions  of  their  domain.  The  domain  is  (implicitly)  subdivided 
into  such  regions,  and  the  problem  is  solved  by  considering  the  regions 
one  at  a  time.  Usually  it  is  necessary  to  consider  only  a  small  frac¬ 
tion  of  all  possible  regions  explicitly.  The  development  of  the  Piece- 
wise  strategy  for  large-scale  programming  is  largely  due  to  J.  B.  Rosen, 
whose  various  Partition  Programming  algorithms  invoke  it  subsequent 
to  the  Projection  manipulation. 

Restriction  is  often  appropriate  for  problems  with  s  large  number 
of  nonnega'cive  variables.  It  enables  reduction  to  a  recursive  sequence 
of  problems  in  which  most  of  the  variables  are  fixed  at  zero.  The 
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Simplex  Method  itself  turns  out  to  be  a  special  form  of  Restriction 
for  linear  programming,  although  the  strategy  also  applies  to  nonlinear 
problems.  Restriction  is  almost  always  used  if  Inner  Linearization  has 
been  applied. 

Relaxation  is  useful  for  problems  with  many  inequality  constraints. 
It  reduces  such  a  problem  to  a  recursive  sequence  of  problems  in  which 
many  of  these  constraints  are  ignored.  The  Dual  Method  of  linear  pro¬ 
gramming  is  a  special  form  of  Relaxation,  although  the  strategy  applies 
equally  well  to  nonlinear  problems.  Outer  Linearization  is  almost 
always  followed  by  Relaxation. 

Perhaps  the  most  important  solution  strategy  not  discussed  here 
la  the  well-known  Feasible  Direction  strategy  [Zoutendijk  60] ,  which 
reduces  a  problem  with  differentiable  functions  to  a  sequence  of  one¬ 
dimensional  optimization  problems  along  carefully  chosen  directions. 

Most  of  the  more  powerful  primal  nonlinear  programming  algorithms 
utilize  this  strategy,  but  their  application  to  large-scale  problems 
is  frequently  hampered  by  non-differentiability  (if  Dualization  or 
Projection  is  used)  if  not  by  sheer  size  (especially  if  Inner  or  Outer 
Linearization  is  used).  See  Sec.  4.4  for  an  instance  in  which  the 
first  obstacle  can  be  surmounted. 

We  have  also  omitted  discussion  of  the  Penalty  strategy  (e.g., 
iFiacco  and  McCormick  68]),  which  reduces  a  constrained  problem  to  a 
sequence  of  essentially  unconstrained  problems  via  penalty  functions. 

The  relevance  of  this  strategy  to  large-scale  programming  is  hampered 
by  the  fact  that  penalty  functions  tend  to  destroy  linearity  and  linear 
separability. 
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3.1  PIECEWISE  STRATEGY 

Suppose  that  o'.ie  must  solve 


(3.1)  Maximize  v(y), 

yeY 

where  v  is  a  "piecewise-simple"  function  (e.g. ,  piecewise- linear  oi 

piecewise-quadrat ic)  in  the  sense  that  there  are  regions  (pieces) 

1  2 

P  ,?,...  of  its  domain  such  that  v  coincides  wi«-h  a  relatively 

k  k 

tractable  function  v  on  P  .  The  situation  can  be  depicted  as  in 
Fig.  6,  in  which  Y  is  a  disk  partitioned  into  four  regions.  Let 
us  further  suppose  that  v  is  concave  on  the  convex  set  Y  and  that, 


y? 


y  i 


Fig.  6 


given  any  particular  point  in  Y,  we  can  explicitly  characterize  the 
particular  piece  to  which  that  point  belongs,  as  well  as  v  on  that  piece. 
Then  It  is  natural  to  consider  solving  (3.1)  in  the  following  piecemeal 
fashion  that  takes  advantage  of  the  piecewise-simplicity  of  v.  Note  that 
it  is  unnecessary  to  explicitly  characterize  all  of  the  pieces  in  advance. 
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The  Piecewise  Strategy 

Step  1  Let  a  point  y°  feasible  in  (3.1)  be  given. 

Determine^the  corresponding  piece  ?°  con¬ 
taining  y  and  the  corresponding  function 
v°. 

Step  2  Maximize  v°(y)  subject  to  yeY  (~l  ?°.  Let 
y'  be  an  optimal  solution  (an  infinite 
optimal  value  implies  termination). 

Step  3  Determine  a  piece  P'  adjacent  to  P*  at  y 
such  that  v(y)  >  v(y*>  for  some  yeY  H  P' 
{if  none  exists,  y*  is  optimal  in  (3.1)]. 
Determine  the  corresponding  function  v' 
and  return  to  Step  2  with  P',v',  and  y* 
in  place  of  ?*,  v#,  and  y*. 


A  hypothetical  trajectory  for  y  is  traced  in  Fig.  6  as  a  dotted 
line.  Optimisations  (Step  2)  were  performed  in  three  regions  before 
the  optimal  solution  of  (3.1)  was  found. 

The  problem  at  Step  2  has  a  simpler  criterion  function  than  (3.1) 
itself,  although  it  has  more  constraints  (yeP^) .  If  it  is  sufficiently 
simple  by  comparison  with  (3.1),  then  the  Piecewise  strategy  is 
likely  to  be  advantageous  provided  Steps  1  and  3  are  not  too  difficult. 
Both  Steps  2  and  3  can  give  rise  to  "subproblems"  when  this  strategy 
is  used  for  large-scale  programming. 

The  principal  use  of  the  Piecewise  strategy  in  large-scale  pro¬ 
gramming  is  for  problems  resulting  from  Projection  and  Dualization. 

In  both  cases  [cf.  (2.2)],  v  involves  the  optimal  value  of  an  associated 
"inner"  optimization  problem  parameterized  by  y.  Evaluating  v  requires 
solving  the  inner  problem,  and  so  v  is  not  explicitly  available  in 
closed  form.  Fortunately,  it  usually  happens  that  evaluating  v(y°) 
yields  as  a  by-product  a  characterization  of  the  piece  P°  containing 
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y°  on  which  v  has  relatively  simple  form.  We  shall  illustrate  this 
with  a  simple  example.  Set  also  Sec.  4,2  and  {Geoffrion  68b;  Sec.  5j. 

The  Piecewise  strategy  can  also  be  used  to  motivate  a  generaliza¬ 
tion  of  the  Simplex  Method  that  allows  the  miniraand  to  be  a  sum  of 
piecewise-linear  univariate  convex  functions  [Orden  and  Nalbandian  68]. 

Example 

Constrained  games  and  similar  applications  can  lead  to  problems 
of  the  form 

(3.2)  Maximize  j  Minimum  J  HC(y)x  s.t.  Ax  «  bVj  , 

yeY  L  xiO  *•  'J 

where  H(")  is  a  concave  vector-valued  function  on  the  convex  set  Y. 

The  maximand  of  (3.2),  v,  is  concave  because  it  is  the  pointwise  minimum 
uf  a  collection  of  concave  functions  of  y.  Suppose  that  we  evaluate 
v  at  y°tY,  with  the  corresponding  optimal  solution  of  the  inner 
problem  being  x°.  The  value  is  Ht(y°)x°.  We  know  from  the  elementary 
theory  of  linear  programming  that,  since  changes  in  y  cannot  affect 
the  feasibility  of  x°,  x°  remains  an  optimal  solution  of  the  inner 
problem  as  y  varies  so  long  as  the  "reduced  costs"  remain  of  the  right 
sign.  Hence  the  value  of  v(y)  is  Ht(y)x°  for  all  y  such  that 

(3.3)  (HB(y))tS_1A  -  h  (y)  ±  0,  all  nonbasic  j  , 

*  J  J 

B 

where  A  .  is  the  j  column  of  A,  and  the  component  functions  of  H 
•  j 

correspond  to  the  variables  x^  in  the  optimal  basis  matrix  B  at  y  . 

Thus  we  see  how  to  accomplish  Step  1,  and  the  problem  to  be  solved  at 
Step  2  is 


(3.4) 


Maximize  Hs (y)x°  s.t.  (3.3)  . 
yeY 

Note  that  (3.4)  has  the  advantage  over  (3.2)  of  an  explicit  criterion 
function.  Since  x°  £  0,  HC(*)x0  is  concave  on  Y. 

4- 

Suppose  that  y’  :ls  an  optimal  solution  of  (3.4).'  If  y*  is  no: 
optimal  in  (3.2),  then  there  must  be  an  alternate  optimal  basis  3'  at 
yf  such  that  the  corresponding  problem  (3.4)  admits  an  improved  solution. 
At  worst,  such  an  "improving"  basis  could  be  found  by  enumerating 
the  alternative  optimal  bases  at  y'.  At  best,  an  improving  basis 
would  be  revealed  by  a  single  active  constraint  among  those  of  (3.3) 
at  y'.  One  could  also  compute  an  improving  feasible  direction  z’  lor 
(3.2)  at  y'  (cf.  Sec.  4.4);  the  appropriate  improving  basis  would  :hen 
be  revealed  by  a  parametric  linear  programming  analysis  of  the  inner 
problem. 

3,2  RESTRICTION 

Restriction  is  a  solution  strategy  principally  useful  for  problems 
with  many  nonnegative  variables,  the  data  associated  with  some  of  which 
perhaps  being  only  implicitly  available.  Combinatorial  models  and  Inner 
Linearization  are  two  fertile  sources  of  such  problems. 

The  basic  idea  is  as  follows:  solve  the  given  problem  subject  to 
the  additional  restriction  that  a  certain  subset  of  the  variables  must 
have  value  0;  if  the  resulting  solution  does  net  satisfy  the  optimality 

'It  may  be  difficult  to  find  a  global  optimum  of  (3.4)  if  H  is 
not  linear,  for  then  (3.3)  need  not  define  a  convex  feasible  region 
(unless  B“^Aj  »  0  for  all  nonbasic  j).  Fortunately,  however,  it  can 
be  seen  from  the  concavity  of  v  that  a  local  optimum  will  generally 
suffice,  although  finite  termination  may  now  be  in  jeopardy. 


-35- 


conditions  of  the  given  problem,  then  "release"  one  or  more  restricted 
variables  (allot)  them  to  be  nonnegative)  and  so’ve  this  less-restricted 
problem;  continue  in  this  fashion  until  the  optimality  conditions  of 
the  given  problem  are  satisfied,  at  which  point  the  procedure  terminates. 
An  important  refinement  forming  an  integral  part  of  the  strategy  involves 
adding  variables  to,  as  well  as  releasing  them  from,  the  restricted  set. 
Note  that  the  variables  restricted  to  0  essentially  drop  out  of  the 
problem,  thereby  reducing  its  size  and  avoiding  the  need  for  knowing 
the  associated  data  explicitly.  If  (as  is  usually  the  case)  only  a 
fairly  small  proportion  of  all  variables  actually  are  active  (positive) 
at  an  optimal  solution,  then  this  strategy  becomes  quite  attractive. 

The  earliest  and  most  significant  embodiment  of  the  Restriction 
strategy  turns  out  to  be  the  Simplex  Method  for  linear  programming 
itself.  It  can  be  shown,  as  we  shall  indicate,  that  a  natural  speciali¬ 
zation  of  Restriction  to  the  completely  linear  case  yields  the  very 
same  sequence  of  trial  solutions  as  does  the  ordinary  Simplex  Method. 

All  of  the  column-generation  schemes  for  implementing  the  Simplex 
Method  for  linear  programs  with  a  vast  number  of  variables  can  therefore 
be  viewed  in  teras  of  Restriction.  We  shall  review  one  of  theae  schemes 
[Gilmore  and  Gomory  61]  at  the  end  of  this  section.^  The  usefulness  of 
Restriction  is  not,  however,  limited  to  the  domain  of  linear  programming. 
It  will  be  shown  in  Sec.  4.3  how  this  strategy  can  yield,  in  a  nonlinear 
case,  variations  of  the  Dantzig-Wolfe  method  for  convex  programming. 

* Another  column-generating  scheme  is  explained  in  Sec.  4.3.  See 
also  part  B  of  Table  1. 
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Formal  Statement 

Consider  the  problem 

(3.5)  Maximize  f(x)  s.t.  e.(x)  a  0,  i  =  1,  m, 

xeX  1 

where  f  is  a  concave  function  on  the  nonempty  convex  set  X  c  R  and 

the  functions  g, ,  ....  g  are  all  linear.  All  nonlinear  constraints, 
1  m 

as  well  as  any  linear  constraints  that  are  not  to  be  restricted,  are 
presumed  to  be  incorporated  in  X.  The  typical  restricted  version  of 

(3.5)  is  the  (still  concave)  problem 


(3.6) 


Maximize  f(x)  s.t.  g  (x)  =  0,  ieS 
xeX  1 


gt(x)  a  0,  irfS, 


where  S  is  a  subset  of  the  m  constraint  indices.  [Note  that  we  are 
presenting  Restriction  in  a  seemingly  more  general  setting  than  the 
motivational  one  above  in  that  general  linear  inequality  constraints, 
as  veil  as  simple  variable  nonnegativities,  are  allowed  to  be  restricted 
to  equality.  Actually,  the  present  setting  is  no  more  general  since 
slack  variables  could  be  introduced  to  accommodate  the  restriction  of 


general  linear  inequalities.]  Some,  none,  or  all  of  the  x^  £  0  type 
constraints  (if  any)  may  be  included  among  g^ ,  ...»  g^.  The  analyst 
is  free  to  choose  the  linear  inequality  constraints  to  associate  with 


X;  the  rest  are  candidates  for  restriction. 


An  optimal  solution  of  the  restricted  problem  (3.6)  will  be  denoted 

by  xS ,  and  a  corresponding  optimal  multiplier  vector  (which,  under 

s  s  s 

mild  assumptions,  must  exist)  is  denoted  by  u  ”  (p^>  •••.  Pm) • 

The  pair  (x8,  p3)  satisfies  the  Kuhn-Tucker  optimality  conditions  for 


(3.6),  namely 
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m 


(i) 

g 

x  maximizes  f(x) 

s 

+  £  i^g  .  (x)  over  X 

xb  is  feasible  in 

l-l  1  1 

(ii) 

(3.6) 

(iii) 

uS  0,  i^S 

l 

(iv) 

~1i8i^>  =  i^S' 

We  are  now  ready  to  give  a  formal  statement  of  Restriction 
applied  to  (3.5).  Notice  that  not  only  are  constraints  released  from 

the  current  restricted  set  S  at  each  iteration,  but  additions  are 

s  s 

also  made  whenever  g^(x  )  «  0  for  some  i^S,  provided  that  f(x  )  has 

just  increased. 


The  Restriction  Strategy 

Step  1  Put  f  =  and  S  equal  to  any  aubset  of  indices 
such  that  the  corresponding  restricted  problem 
(3.6)  is  feasible. 

g 

Step  2  Solve  (3.6)  for  an  optimal  solution  x  and  as¬ 
sociated  optimal  multipliers  ll  (if  it  has  un¬ 
bounded  optimal  value,  the  same  be  true 

of  the  given  problem  (3.5)  and  we  terminate). 

If  u.  ^  0  for  all  ieS,  then  terminate  (x  is 
optimal  in  (3.5));  otherwise,  go  on  to  Step  3. 

Step  3  Put  V  equal  to  any  subset  of  S  that  includes 
at  least_cne  constraint  for  which  u.  <  0.  If 
f(xs)  >  f,  replace  f  by  f(xS)  and  S  by  E-V, 
where  E  £  { 1  i  :  g  (xs)  =  0}  ;  otherwise, 
(l.e.,  if  f(xs)  =  f ) ,  replace  S  by  S-V.  Return 
to  Step  2. 


We  assume  that  the  given  problem  (3.5)  admits  a  feasible  solution, 
so  that  Step  1  is  possible.  To  ensure  that  Step  2  is  always  possible, 
we  also  assume  that  the  restricted  problem  (3.6)  admits  an  optimal 
solution  and  multiplier  vector  whenever  it  is  feasible  and  has  finite 
supremal  value.  It  is  a  straightforward  matter  to  show  that  the 
termination  conditions  of  Step  2  are  valid,  and  Step  3  is  obviously 
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always  possibLe.  Thus  the  strategy  is  well  defined,  although  we  have 
deliberately  not  specified  how  to  carry  out  each  step. 

g 

An  important  property  is  that  the  sequence  <£(x  )>  is  non¬ 
decreasing.  Thus  the  strategy  yields  an  improving  sequence  of  feasible 
solutions  to  (3.5).  Moreover,  <f(xS)>  can  be  stationary  in  value  at 
most  a  finite  number  of  consecutive  times,  since  the  role  of  f  at 
Step  3  is  to  insure  that  S  is  augmented  (before  deletion  by  V)  only 
when  f(xS)  has  just  increased.  Hence  termination  must  occur  in  a 
finite  number  of  steps,  for  there  is  only  a  finite  number  of  possi- 

g 

bilities  for  S  and  each  increase  in  f(x  )  precludes  repetition  of  any 
previous  S. 

Options  and  Relation  to  the  Simplex  Method 

Let  us  now  consider  the  main  options  of  Restriction  beyond  the 
decision  as  to  which  of  the  linear  inequality  constraints  will 
comprise  g, ,  ....  g^. 

(i)  How  to  select  the  initial  S  at  Step  1? 

(ii)  How  to  solve  (3.6)  for  (x  ,  u  )  at  Step  2* 

(iii)  What  criterion  to  use  m  selecting  V  at  Step  3? 

How  these  options  are  exercised  exerts  a  great  influence  upon  the 
efficiency. 

As  stated  above,  there  is  an  intimate  relationship  between 
Restriction  and  the  Simplex  Method  in  the  completely  linear  case. 


Giver,  the  linear  program 
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Maximize  cCx  s.t.  Ax  *  b,  x  «  0, 
x 

define  (3.5)  according  to  the  identifications 

f(x)  =  cLx 

gf (x)  =  all  i 

X  =  (x  :  Ax  =  bl  , 

and  specialize  Restriction  as  follows:  let  the  initial  S  be  chosen 
to  coincide  with  the  ncnbasic  variables  in  an  initial  basic  feasible 
so.uticn,  and  select  V  at  Step  3  to  be  the  index  of  the  most  negative 
as,  It  can  then  be  shown,  under  the  assumption  of  nondegeneracy, 
tlat  Restriction  is  equivalent  to  the  usual  Simplex  Method  in  that 

the  set  of  innbasic  variables  at  the  iteration  of  the  Simplex 

th 

Method  necessarily  coincides  with  E  at  the  v  ‘  iteration  of  Restriction, 
and  the  v*'*'  basi*.  feasible  solution  coincides  with  the  v1"*1  optimal 
solution  x  of  (3.6).  Thus  Restriction  can  be  viewed  as  one  possible 
strategic  generalization  of  the  Simplex  Method.  Not  only  is  this  an 
interesting  fact  in  its  own  right,  but  it  also  permits  us  co  draw  some 
inferences--as  we  shall  see  in  the  discussion  beiow--concerning  how 
best  to  exercise  the  options  of  Restriction. 

Step  1 

The  selection  of  the  initial  S  should  be  guided  by  two  objectives: 
to  make  the  corresponding  restricted  problem  easy  to  solve  by  comparison 
with  the  given  problem,  and  to  utilize  any  prior  knowledge  that  may 
be  available  concerning  which  of  the  g^  constraints  are  likely  co  hold 
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with  equality  at  an  optimal  solution.  In  the  Simplex  Method,  for 
example,  the  initial  choice  of  S  implies  that  the  restricted  problem  is 
trivial  since  it  has  a  unique  feasible  solution;  at  every  subsequent 
execution  of  Step  2,  the  restricted  problem  remains  nearly  trivial  with 
essentially  only  one  free  variable  (the  entering  basic  variable).  Use¬ 
ful  prior  knowledge  is  often  available  if  the  given  problem  is  amenable 
to  physical  or  mathematical  insight  or  if  a  variant  has  been  solved 
previously. 

Step  2 

s  s 

How  to  solve  the  restricted  problem  for  (x  ,  a  )  at  Step  2  depends, 
of  course,  on  its  structure.  Hopefully,  enough  constraints  will  be 
restricted  to  equality  to  make  it  vastly  simpler  than  the  original 
problem.  In  any  event,  it  is  advisable  to  take  advantage  of  the  fact 
that  a  sequence  of  restricted  problems  must  be  solved  ae  the  Restric¬ 
tion  strategy  is  carriH  out.  Except  for  the  first  execution  of 
Step  2,  then,  what  is  required  is  a  solution  recovery  technique  that 
effectively  utilizes  the  previous  solution.  The  pivot  operation  per¬ 
forms  precisely  this  function  in  the  Simplex  Method,  and  serves  as  an 
ideal  to  be  approached  in  nonlinear  applications  of  Restriction. 

It  is  worth  mentioning  that  many  solution  (or  solution  recovery) 
techniques  that  could  be  used  for  the  restricted  problem  automatically 

8  S  8 

yield  p  as  well  as  x  .  When  this  i«  not  the  case,  one  mey  fin-'',  p 
once  xB  is  known  by  solving  a  linear  problem  if  f  and  the  constraint 
functions  defining  X  are  differentiable,  since  under  these  conditions 
the  Kuhn-Tucker  optimality  conditions  for  (3.6)  in  differential  form 


become  linear  in  p. 
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Step  3 

Perhaps  the  most  conspicuous  criterion  for  choosing  V  at  Step  3 
is  to  let  it  be  the  index  of  the  constraint  corresponding  to  the  most 
negative  p^.  One  rationale  for  this  criterion  is  as  follows.  Suppose 
that  pS  is  unique.  It  can  then  be  shcvm  (see  [Geoff rion  69]  or  [Rocka- 
fellar  68])  that  the  optimal  value  of  the  restricted  problem  is  differ¬ 
entiable  as  a  function  of  perturbations  about  0  of  the  right-hand  ride 

of  the  g^  constraints,  and  that  -p®  is  the  partial  derivative  of  the 

th 

optimal  value  with  respect  to  such  perturbations  of  the  i  ‘  constraint. 

g 

Thus  the  most  negative  p^  identifies  the  constraint  in  S  whose  release 
will  lead  to  the  greatest  initial  rate  of  improvement  in  the  value  of 
f  as  this  constraint  is  permitted  to  deviate  positively  from  strict 

g 

equality.  It  can  be  argued  that  p  is  likely  to  be  unique,  but  if  we 

C 

drop  this  supposition  then  -y,  still  provides  an  upper  bound  on  the 
Initial  rate  of  improvement  even  though  differentiability  no  longer 
holds . 

This  most-negative-multiplier  criterion  is  precisely  the  usual 
criterion  used  by  the  Simplex  Method  in  its  version  of  Step  3  to 
select  the  entering  basic  variable,  but  it  is  Ky  no  means  the  only 
criterion  used.  The  extensive  computational  e:.perience  presently 
available  with  different  criteria  used  i.n  the  Simplex  Method  may 
permit  some  inferences  to  be  drawn  concerning  the  use  of  analogous 
criteria  in  the  nonlinear  case.  It  has  been  observed  [Wolfe  and 
Cutler  63j,  for  example,  that  the  most-negative-mu It ipl ier  criterion 
typically  leads  to  a  number  of  iterations  equal  to  about  twice  the 
number  of  constraints,  and  that  other  plausible  criteria  can  be 
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expected  to  be  consistently  better  by  no  more  than  a  factor  of  two  or 
so.+  Lest  it  be  thought  that  V  must  necessarily  be  a  singleton,  we 
note  that  we  may  interpret  Wolfe  and  Cutler  to  have  also  observed 
{ibid ♦ ,  p.  190]  that  choosing  V  to  consist  of  the  five  most  negative 
multipliers  reduced  the  number  of  iterations  by  a  factor  of  two  as 
compared  with  the  single-most-negative-multiplier  choice.  Of  cours 
this  increases  the  time  required  to  solve  each  restricted  problem. 
Experience  such  as  this  should  at  least  be  a  source  of  hypotheses  to 
be  examined  in  nonlinear  applications  of  Restriction. 


Mechanizing  the  "Pricing"  Operation 

Each  iteration  of  Restriction  requires  determining  whether 
there  exists  a  negative  multiplier  and,  if  so,  ac  least  one  must  be 
found.  In  the  ordinary  Simplex  Method,  which  as  hap  been  indicated 
can  be  viewed  as  a  particular  instance  of  Restriction,  this  was 
originally  done  enumeratively  by  scanning  the  row  of  reduced  costs 
for  an  entry  of  the  "wrong"  sign.  To  deal  with  large  numbers  of 
variables,  however,  it  is  desirable  whenever  possible  to  replace 
this  enumeration  by  an  algorithm  that  exploits  the  structure  of  the 
problem.  This  is  referred  to  as  mechanized  pricing . 


An  example  of  another  plausible  criterion  is  this:  select  V  to 
be  the  index  of  the  constraint  which,  when  deleted  from  S,  will  result 
in  the  greatest  possible  improvement  in  the  optimal  value  cf  the  re¬ 
stricted  problem.  Of  course,  this  criterion  is  likely  to  be  prohibitively 
expensive  computationally  in  the  nonlinear  case. 

**This  is  known  as  multiple  pricingt  a  feature  used  in  ®cst 
production  linear  programming  systems  designed  for  large-scale  problems. 
See,  for  example,  (Orchard-Hays  68,  Sec  6.1]. 
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Mechanized  pricing  is  widely  practiced  in  the  context  of  linear 
programming,  where  it  is  often  referred  to  as  coliam-generation. 

Since  the  pioneering  paper  [Ford  and  Fulkerson  58],  many  authors 
have  shown  how  pricing  could  be  mechanized  by  means  of  subsidiary 
network  flow  alborithms,  dynamic  programming,  integer  programming, 
and  even  linear  programming.  See  the  references  of  part  B  of  Table  1, 
[Balinski  64],  and  [Gomory  63].  It  will  suffice  to  mention  here 
but  one  specific  illustration:  the  cutting-stock  problem  as  treated 
by  [Gilmore  and  Gomory  61].  See  also  Sec.  4.3. 


Cutting-Stock  Problem 

A  simple  version  of  Gilmore  and  Gomory's  cutting-stock  problem, 
without  the  integrality  requirement  on  x,  is 

(.3.7)  Minimize  ]£)  x.  s.t.  £  a.  .x.  a  r . ,  i  ■  1,  ...,  m, 
x=0  j  J  j  J  1 


where  a^  is  the  number  of  pieces  ot  length  L i  produced  when  the  cut- 

til 

ting  knives  are  set  in  the  j  pattern,  r^  is  the  minimum  number  of 

required  pieces  of  length  and  x^  is  the  number  of  times  a  bar  of 

stock  is  cut  according  to  pattern  j.  The  number  of  variables  is  very 

large  because  of  the  great  variety  c:  ways  in  which  a  bar  of  stock 

can  be  cut.  It  is  easy  to  see  that  each  column  of  the  matrix  A 

is  of  the  form  (y  ,  ...,  y  )t ,  where  y  is  a  vector  of  nonnegative 
l  _  m 
m 

integers  satisfying  I  Z.y.  *?  X  (X  is  the  length  of  a  bar  of 

i*l  1  1 


1;  * 

w 


fv 
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stock);  and  conversely,  every  such  vector  corresponds  to  some  column 
(assuming  that  all  possible  patterns  are  allowed).  When  Restriction 
is  applied  to  (3.7)  in  the  form  of  the  Simplex  Method,  it  follows  that 
the  problem  of  determining  the  most  negative  multiplier  can  be  expressed 
as  the  subsidiary  optimization  problem 

n 

(3.8)  Minimize  1  -  u  y  s . t .  l.y.  £  y  integer  , 

y=0  i=l  1  1 

where  u  is  the  known  vector  of  the  current  "Simplex  multipliers ." 

If  slack  variables  are  given  priority  over  structural  variables  in 
determining  entering  basic  variables  (cf.  Sec.  4.3),  then  u  can  be 
assumed  nonnegative  and  (3.8)  is  a  problem  of  the  will-known  "knapsack" 
variety,  for  which  very  efficient  solution  techniques  are  available. 

See  [Gilmore  and  Goraory  61]  for  full  details. 

3.3  RELAXATION 

Whereas  Restriction  is  a  solution  strategy  principally  useful  for 
problems  with  a  large  number  of  variables,  the  complementary  strategy 
of  Relaxation  is  primarily  useful  for  problems  with  a  1{ rge  number  of 
inequality  constraints,  some  of  which  may  be  only  implicitly  available. 
Such  problems  occur,  for  example,  as  a  result  of  Outer  Linearization.^ 
One  of  the  earliest  uses  of  Relaxation  was  in  [Dantzig,  Fulkerson,  and 
Johnson  54],  and  since  that  time  this  strategy  has  appeared  in  one  guise 

^Relaxation  can  also  be  useful  for  dealing  with  large  numbers 
of  nonnegative  variables;  when  a  constraint  such  as  xj  k  0  is  re¬ 
laxed,  the  variable  xj  can  often  be  substituted  out  of  the  problem 
entirely  [Ritter  67c],  [Webber  and  White  68]. 
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j 

or  another  in  the  works  of  numerous  authors.  We  discuss  [Kelley  60] 
at  the  end  of  this  section,  and  [Benders  62]  in  Sec.  4.1. 

The  essential  idea  of  Relaxation  is  this:  solve  a  relaxed  version 
of  the  given  problem  ignoring  some  of  the  inequality  constraints; 
if  the  resulting  solution  does  not  satisfy  all  the  ignored  constraints, 
then  generate  and  include  one  or  more  violated  constraints  in  the 
relaxed  problem  and  solve  it  again;  continue  in  this  fashion  until  a 
relaxed  problem  solution  satisfies  all  of  the  ignored  constraints,  at 
which  point  an  optimal  s  »  of  the  given  problem  has  been  found. 

An  Important  refinement  invo_  'ropping  amply  satisfied  constraints 
from  the  relaxed  problem  when  this  does  not  destroy  the  Inherent 
finiteness  of  the  procedure.  We  give  a  formal  statement  of  Relaxation 
(with  the  refinement)  below. 

Relaxation  and  Restriction  are  complementary  strategies  in  a  very 
strong  sense  of  the  word.  In  linear  programming,  for  example,  whereas 
a  natural  specialization  of  Restriction  is  equivalent  to  the  ordinary 
Simplex  Method,  it  is  also  true  [Geoffrion  68a]  that  a  similar  special¬ 
ization  of  Relaxation  is  equivalent  to  Lemke's  Dual  Method.  It  follows, 
very  significantly,  that  Restriction  (Relaxation)  applied  to  a 
linear  program  essentially  corresponds  to  Relaxation  (Restriction)  ap¬ 
plied  to  the  dual  linear  program.  In  fact  [ibid.],  the  same  assertion 
holds  for  quite  general  convex  programs  as  well.  This  complementarity 
makes  it  possible  to  translate  most  statements  about  Restriction  into 
statements  about  Relaxation,  and  conversely. 

^Rt  axation  without  problem  manipulation  is  used  in  Dantzig  55a, 
Sec.  3;  Stone  58;  Thompson,  Tonge  and  Zionts  66;  Ritter  67c;  Grigoriadis 
and  Ritter  68.  The  following  papers  all  use  the  pattern  Outer  Lineari- 
zation/Rel.-.xation :  Cheney  and  Goldstein  59;  Kelley  60;  Dantzig  and 
Madansky  61,  p.  174;  iarikh  67;  Veinott  67.  The  references  of  part  A 
of  Table  2  all  use  the  pattern  Projection,  Outer  Linearization/Relaxa¬ 
tion.  See  also  the  second  footnote  in  Sec.  1.2. 
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Since  we  have  already  given  a  relatively  detailed  discussion  of 
Restriction,  a  somewhat  abbreviated  discussion  of  Relaxation  will 
suffice.  See  [ibid.]  for  a  more  complete  discussion. 

Formal  Statement 

Let  f,  g^,  . . .  ,  g  be  concave  functions  on  a  nonempty  convex 
set  X  C  Rn.  The  concave  program 

(3.9)  Maximize  f(x)  s.t.  g.(x)  s  0,  i  =  1 . m 

A  1 

is  solved  by  solving  a  sequence  of  relaxed  problems  of  the  form 

(3.10)  Maximize^^j.  f(x)  s.t.  g^(x)  >  0,  ieS, 

where  S  is  1  subset  of  {l,  . . .  ,  m"1 .  Assume  that  (3.10)  admits  an 
optimal  solution  x  whenever  it  admits  a  feasible  solution  and  its 
maximand  is  bounded  above  on  the  feasible  region,  and  assume  further 
that  an  initial  subset  of  constraint  indices  is  known  such  that 

(3.10)  has  a  finite  ootimal  solution.  (This  assumption  can  be  en¬ 
forced,  If  necessary,  by  enforcing  continuity  of  all  functions  and 
compactness  of  X.) 

Under  these  assumptions,  it  is  not  difficult  to  show  that  the 
following  strategy  is  well  defined  and  terminates  in  a  finite  number  of 
steps  with  either  an  optimal  solution  of  the  given  problem  (3.9)  or 
knowledge  that  none  exists;  moreover,  in  the  first  case  a  nonincreesing 

g 

sequence  <f(x  )>  of  upper  bounds  on  the  optimal  value  of  (3.9)  is 
obtained  and  the  first  solution  of  (3.10)  that  is  feasible  in  (3.9)  is 
also  optimal.  This  version  of  Relaxation  deletes  amply  satisfied  con¬ 
straints  from  S  so  long  as  <f(xS)>  is  decreasing. 
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The  Relaxation  Strategy 

Step  1  Put  f  =  cd  and  S  equal  to  any  subset  of  indices 
such  that  the  corresponding  relaxed  problem 
(3.10)  has  a  finite  optimal  solution. 

s 

Step  2  Solve  (3.10)  for  an  optimal  solution  x  if 
one  exists;  if  none  exists  (i.e.,  if  the  re¬ 
laxed  problem  is  infeasible),  then  terminate 
(the  given  problem  is  infeasible).  If 
gi(xs)  2  0  for  all  i/S,  then  terminate 
(xs  is  optimal  in  the  given  problem);  other¬ 
wise,  go  on  to  Step  3. 

Step  3  Put  V  equal  to  any  subset  of  constraint 

indices  that  includes  at  least  one_constraint_ 
such  that  g.(xS)  <'  0.  If  f(xs)  f,  replace  f 
by  f(xs)  ani  S  by  E  LI  V,  where  E_£  [ieS  :  g^(.xa)  * 
0};  otherwise  (i.e.,  if  f(xs)  =  f ) ,  replace  S  by 
SUV.  Return  to  Step  2. 


Discussion 


As  with  Restriction,  the  analyst  has  considerable  leeway  con¬ 
cerning  how  he  applies  the  Relaxation  strategy.  For  instance,  he  can 
select  the  constraints  that  are  to  be  candidates  for  Relaxation 
(g,»  - ...  g  )  in  any  way  he  wishes;  the  rest  comprise  X.  He  is  free 
to  choose  the  initial  S  so  as  to  allow  an  easy  start,  or  to  take  ad¬ 
vantage  of  prior  knowledge  concerning  which  of  the  constraints  might 
be  active  at  an  optimal  solution.  Ke  can  choose  the  most  effective 
solution  recovery  technique  to  reoptimize  the  successive  relaxed 
problems.  And,  very  importantly,  he  can  choose  the  criterion  by 
which  V  will  be  selected  at  Step  3  and  the  method  by  which  the  cri¬ 
terion  will  be  implemented. 

Probably  the  most  natural  criterion  is  to  let  V  be  the  index  of 
the  most  ’iolated  constraint.  This  is  the  criterion  most  commonly 
employed  in  the  Dual  Method  of  linear  programming,  for  example. 
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although  other  criteria  are  possible.  The  complementarity  between 
Relaxation  and  Restriction  mentioned  earlier  enables  us  to  interpret 
existing  computational  experience  in  linear  programming  so  as  to 
shed  light  on  the  merits  and  demerits  of  several  alternative  criteria. 
The  discussion  oi  Step  3  of  Restriction  should  make  further  discussion 
of  this  point  unnecessary.  We  should  remark,  however,  that  in  some 
applications  (e.g.,  [Dantzig,  Fulkerson  and  Johnson  54"],  [Gomory  58  j , 
[Kelley  601)  only  one  or  a  few  violated  constraints  are  accessible 
each  time  the  relaxed  problem  is  solved,  end  it  is  therefore  indi¬ 
cated  that  these  be  used  regardless  of  whether  they  satisfy  any 
particular  criterion.  In  other  applications  a  criterion  such  as 
"most  violaced  constraint"  is  within  the  realm  of  attainability, 
and  can  be  approached  via  a  subsidiary  linear  program  [Benders  62], 
network  flow  problem  [Gomory  and  Hu  62],  or  some  other  subsidiary 
optimization  problem  that  is  amenable  to  efficient  solution.  This 
is  the  counterpart  of  mechanized  pricing  in  Restriction. 

Restriction  and  Relaxation,  opposites  though  they  are  to  one 
another,  are  by  no  means  incompatible.  In  fact  it  can  be  shown 
[Geoffrion  66  and  67]  that  both  strategies  can  be  used  simulta¬ 
neously.  The  reduced  problems  become  still  more  manageable,  but 
assurance  of  finite  termination  requires  more  intricate  control. 

The  Cutting-Plane  Method 

One  important  use  of  Relaxation  occurs,  as  we  have  mentioned, 
in  connection  with  problems  that  have  been  outer-linearized.  This 


/  n 

"H?- 


will  be  illustrated  in  the  simplest  possible  se.ting  in  terras  of  the 
problem 


(3.11) 


Minimize  ctx  s.t. 
x  £  0 


Ax  ^  b, 
g(x)  £  0, 


wber^  g  is  a  convex  function  that  is  finite-valued  on 

X  £  {x  *  0  :  Ax  a  b] . 


If  one  manipulates  (3.11)  by  invoking  an  arbitrarily  fine  outer- 
linearization  of  g  and  then  applies  the  Relaxation  strategy  with  the 
new  approximating  constraints  as  the  candidates  for  being  relaxed, 
the  resulting  procedure  is  that  of  [Kelley  60"1. 

Let  us  assume  for  simplicity  that  g  is  differentiable  on  X.* 

Then  g  has  a  linear  support  g(x)  +  Vg(x)(x  -  x)  at  every  point  x  in  X, 
where  Vg(x)  is  the  gradient  of  g  at  x,  and  so  (3.11)  is  equivalent  to 

(3.12)  Minimize  cSc  s.t.  g(x)  +  Vg(x)(x  -  x)  £  0,  ail  xeX. 

xeX 


The  Relaxation  strategy  is  the  natural  one  for  solving  (3.12), 
since  it  avoids  the  need  to  determi.  in  advance  all  of  the  linear 
supports  of  g.  At  each  iteration,  a  relaxed  version  of  this  problem 
with  a  finite  number  of  approximating  constraints  is  solved.  The 
optimal  solution  x  of  the  relaxed  problem  is  feasible  in  (3.12)  if 


The  assumption  of  differentiability  can  be  weakened,  since  it 
is  only  necessary  for  g  to  have  a  support  at  each  point  of  X.  And 
even  this  requirement  can  be  weakened  as  implicitly  suggested  in  the 
conclusion  of  Sec.  2.3  if  (3.11)  is  phrased  in  terms  of  the  epigraph 
of  g. 


T 
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snd  only  if  g(x)  s.  0;  if  g(x)  >  0,  then  evaluation  of  Vg(x)  yields  a 
violated  constraint  that  must  be  appended  to  the  current  relaxed 
problem.  Sin.'e  each  relaxed  problem  is  a  linear  program  that  will 
be  augmented  by  a  violated  constraint,  it  is  natural  to  reoptimi2e 
it  using  postoptimality  techniques  based  on  the  Dual  Method  for 
linear  programming. 

It  is  easy  to  generalize  this  development  to  cover  the  case  in 
which  (3.11)  has  several  (nonlinear)  convex  constraints  and  a  convex 
minimand. 

It  should  be  pointed  out  that  dropping  amply  satisfied  constraints 
from  the  relaxed  problem- -a  feature  incorporated  in  our  statement  of 
Relaxation- -is  questionable  in  this  context  since  (3.12)  has  an  in¬ 
finite  number  of  constraints.  Without  this  feature,  Kelley  has 
given  mild  conditions  under  which  convergence  to  an  optimal  solution 
of  (3.11)  is  assured  in  the  limit. 

We  remark  in  passing  that  the  approach  of  [Hartley  and  Hocking 
63]  for  (3.11)  can  be  viewed  as  Restriction  applied  to  the  dual  of 
(3.12).  Since  Relaxation  of  (3.12)  corresponds  to  Restriction  of 
its  dual,  the  two  approaches  are  really  equivalent. 
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4.  SYNTHESIZING  ALGORITHMS  FROM  MANIPULATIONS  AND  STRATEGIES 

This  section  further  illustrates  the  problem  manipulations  _  id 
solution  strategies  of  the  previous  two  sections  bv  confining  them  in 
various  ways  to  yield  several  known  algorithms.  The  main  object  is  not 
an  exposition  of  these  algorithms,  although  this  is  certainly  important; 
rather,  we  wish  to  focus  on  the  principal  patterns  in  which  manipula¬ 
tions  and  strategies  can  be  assembled.  These  patterns  constitute  the 
real  common  denominators  in  the  literatura  on  large-scale  programming. 
See  Table  2. 

It  is  beyond  the  scope  of  this  effort  to  exemplify  all  of  the  im¬ 
portant  patterns  of  manipulations  and  strategies.  We  shall  limit  our 
discussion  to  five  key  ones: 

1.  PROJECTION,  OUTER  LINEARIZATION/RELAXATION 

2.  PROJECTION/PIECEWISE 

3.  INNER  LINEARIZATION/RESTRICTION 

4.  PROJECTION/FEASIBLE  DIRECTIONS 

5.  DUALIZA? ION/ FEASIBLE  DIRECTIONS 

The  first  pattern  is  illustrated  in  Sec.  4.1  by  Benders'  Partitioning 
Procedure  for  what  might  be  called  semilinear  programs;  the  second  is 
illustrated  in  Sec.  4.2  by  Rosen's  Primal  Partition  Programming  algo¬ 
rithm  for  linear  programs  with  block-diagonal  structure;  the  third  In 
Sec.  4.3  by  Dantxig-Wolfe  Decomposition;  the  fourth  in  Sec.  4.4  by  a 
procedure  the  author  recently  developed  for  nonlinear  programs  with 
multidivisional  structure;  and  the  fifth  in  Sec.  4.5  by  the  "local" 
approach  discussed  by  Takahashl  for  concave  programs  with  "complicating" 
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constraints.  Another  key  pattern,  OUTER  LINEARIZATION/ RELAXATION , 
was  already  illustrated  in  Sec.  3.3  with  reference  to  Kelley's  cutting- 
plane  method.  In  addition,  it  is  indicated  in  Sec.  4.2  how  Rosen's 
algorithm  can  be  used  to  illustrate  the  pattern  DUALIZATION /PIECEWISE, 
and  in  Sec.  4.3  how  Dantzig-Wolfe  Decomposition  can  be  used  to  illus¬ 
trate  DUALIZATION,  OUTER  LINEARIZATION /RELAXATION. 

The  discussion  of  the  various  algorithms  is  as  uncluttered  by 
detail  as  we  have  been  able  to  make  it.  There  is  little  or  no  mention 
of  how  to  find  an  initial  feasib*  *  solution,^  the  details  of  computa¬ 
tional  organization,  or  questions  of  theoretical  convergence.  The 
reader  is  invited  to  ponder  such  questions  in  the  light  of  the  concepts 
and  results  advanced  in  the  previous  two  sections,  and  then  to  consult 
the  original  papers. 

4.1  [Benders  621 

One  might  refer  to 

(4.1)  Maximize  c*x  +  f(y)  s.t.  Ax  +  F(y)  ^  b 

xsJo 

yeY 

as  a  8emilinear  program  because  it  is  a  linear  program  in  x  when 
y  is  held  fixed  temporarily.  The  algorithm  of  [Benders  62]  for 
this  problem  can  be  recovered  b>  applying  the  pattern  PROJECTION, 

OUTER  LINEARIZATION /RELAXATION.  Specifically,  project  (4.1)  onto  the 
space  of  the  y  variables,  outer- linearize  the  resulting  supremal 
value  function  in  the  oaximand,  and  apply  the  Relaxation  strategy 

If  one  exists,  it  can  usually  be  found  by  applying  the  algorithm 
itself  to  a  suitably  modified  version  of  the  given  problem. 
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to  the  new  constraints  arising  as  a  consequence  of  Outer  Linearization. 
Assume  for  simplicity  that  (4.1)  is  feasible  and  has  finite  optimal 
value. 

Projection  onto  the  space  of  the  y  variables  yields 

(4.2)  Maximize  [f(y)  +  Sup  {c 'x  s.t.  Ax  ^  b  -  F(y)  }]. 

yeY  xifo 

Note  that  the  supremal  value  function  appearing  in  the  maximand  cor¬ 
responds  to  the  linear  program 

(4.3)  Maximize  cCx  s.t.  Ax  *  b  -  F(y). 

x^O 

This  program  is  parameterized  nenlinearly  in  the  right-hand  side  by  y, 
and  our  assumption  implies  chat  it  has  a  finite  optimum  for  at  least 
one  value  of  y.  By  the  Dual  Theorem,  therefore,  the  dual  linear 
program 

(4.4)  Minimize  ut(b  -  F(y))  s.t.  uCA  £  c*" 

u^O 

must  be  feasible  (for  all  y).  Let  <ul,  ...»  uP>  be  the  extreme  points 

p+1  p+q 

and  ,  ....  ur  representatives  of  the  extreme  rays  of  the  feasi¬ 

ble  region  of  (4.4)  (cf.  Th.  3).  Again  using  the  Dual  Theorem,  we  see 
that  (4.3)  is  feasible  if  and  only  if  (4.4)  has  finite  optimal  value, 
that  is,  if  and  only  if  y  satisfies  the  constraints 

(4.5)  (uJ)C(b  -  F(y)>  £  0,  j  •=  p  +  1,  . ..,  p  +  q. 

Since  we  take  the  supremal  value  function  in  (4.2)  to  be  -®  fer  y 
such  that  (4.3)  is  infeasible--see  Sec.  2.1--we  may  append  the 


» 


E 
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constraints  (4.5)  to  (4.2).  Thus  Projection  applied  to  (4.1)  yields 
(4.2)  subject  to  the  additional  constraints  (4.5). 

Next  we  outer-linearize  the  supremal  value  function  appearing 
in  (4.2).  It  is  easy  to  see,  referring  to  (4.4),  that  its  vaiu>»  is 
precisely 

(4.6)  Minimum  [(uJ)t(b  -  F(y))  } 

lij;Sp 

for  all  y  feasible  in  (4.2)  with  (4.5)  appended.  (Strictly  speaking, 
it  is  accurate  to  call  this  Outer  "Linearization"  only  if  F  is  linear.) 
With  this  manipulation,  (4.2)  becomes 

(4.7)  Maximize  [f(y)  +  Minimum  {(u^)C(b  -  F(y))}]  s.t.  (4.5) 

y€Y  lSj^p 

or,  with  the  help  of  an  elementary  manipulation  based  on  the  fact  that 
a  minimum  is  really  a  greatest  lower  bound, 

(4.8)  Maximize  f(y)  +  y 

yeY  ° 

yo 

s.t.  yQ  <:  (u^)f(b  -  F(y)) ,  j  *  1,  ...,  p 

(uVtt  -  F(y))  a  0,  j  -  p  +  1,  ....  p  +  q. 

This  is  the  master  problem  to  be  solved. 

Relaxation  is  a  natural  strategy  for  (4.8);  it  avoids  having 
to  determine  in  advance  all  of  the  vectors  u^ ,  j  -  1,  . . . ,  p  +  q.  To 
teat  the  feasibility  of  a  trial  solution  (yo,  y) ,  where  yeY,  one  solves 
the  linear  subproblem  (4.4)  with  y  equal  to  y.  If  the  infimal  value 
is  greater  than  or  equal  to  yQ,  then  (yo»  y)  Is  feasible  and  therefore 


-55- 


optimal  in  (4.8);  y,  along  with  x  equal  to  the  optimal  dual  variables 

of  (4.4),  is  an  optimal  solution  of  the  given  problem  (4.1).  If,  on 

the  other  hand,  the  infimal  value  is  less  than  y  ,  then  a  violated 

o 

constraint  of  (4.8)  is  produced  (some  u^  with  1  -  j  -  p  is  found  if 
the  infimal  value  is  finite,  while  p+lijip+qifitis  -®) .  Of 
course,  f,  F,  and  Y  must  satisfy  the  obvious  convexity  assumptions  if 
dropping  amply  satisfied  constraints  is  to  be  justified.  These  assump¬ 
tions  will  probably  have  to  hold  anyway  if  the  relaxed  problems  based 
on  (4.8)  are  to  be  concave  programs  (remember  *  0) .  There  is,  how¬ 
ever,  at  least  one  other  interesting  case:  if  Y  is  a  discrete  set,  say 
the  Integer  points  of  some  convex  polytope,  while  f  and  F  are  linear, 
then  (4.8)  is  a  pure  (except  for  yQ)  integer  linear  program  (see 
[Ballnskl  and  Wolfe  63],  (Buzby,  Stone  and  Taylor  65]). 

The  present  development  seems  preferable  to  the  original  one 
since:  (a)  it  justifies  dropping  amply  satisfied  constraints  from  suc¬ 
cessive  relaxed  versions  of  (4.8);  (b)  it  retains  f(y)  in  its  natural 
position  in  the  criterion  function  of  (4.3)  (Benders'  version  of  (4.8), 
which  is  also  equivalent  to  (4.7),  has  yo  alone  as  the  criterion  func¬ 
tion  and  an  added  term  f(y)  in  the  right-hand  side  of  each  Ibe  first 
p  constraints) ;  and  (c)  its  comparative  simplicity  suggests  a  generali¬ 
zation,  with  the  help  of  nonlinear  duality  theory,  permitting 
nonlinearities  in  x.  Details  concerning  (c)  will  be  provided  in  & 
forthcoming  paper. 
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The  algorithm  o£  LRoser.  64]  for  the  linear  program 
i 

(^.9)  Maximize  b^y  +  £  b^x  s.t.  xJa  +  y*D.  £  cC,  i  =  1,  ....  I 

v  .  *  *■  11. 


illustraces  the  pattern  PROJECTION/PIECEWISE.  Assume  for  simplicity 
that  (4.9)  is  feasible  and  has  finite  optimal  value. 

Projection  onto  the  y  variables  yields  the  master  problem 

r  ^ 

(4.10)  Maximize  I  bCy  +  ^ 

y  L  °  1-1 


Sup  jbjaj  S.t.  xj*t  <cj  -  yVjj, 


wnere  we  have  separated  the  supremum  in  the  nvaximand  (this  separation 
is  perhaps  the  main  justification  for  using  Projection). 

The  Piecewise  strategy  is  appropriate  for  (4.10)  because  each 
supremal  value  in  the  raaximsnd  is  piecewise- linear  as  a  function  of 
y.  This  follows  from  the  elementary  theory  of  linear  programming , 
as  we  now  explain.  Let  y  be  feasible  ir.  (4.10)  in  the  sense  that  the 
maximand  is  not  •».  Then  each  of  the  l  linear  programs  appearing  in 
the  maximand  must  have  a  finite  optimal  value,  and  by  the  Dual  Theorem 
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(4.12)  (c*  -  yS^u. 

so  long  as  the  reduced  costs  remain  of  the  correct  sign,  that  is,  so 
long  as  y  satisfies  the  condition 

g 

(4.13)  (c*  -  ytD.)  iB-1(Ai)j  -  (cj  -  y^),  s  0.  all  nonbasic  j, 

where  the  superscript  masks  all  but  the  basic  components  of 
(c^  -  ytDi).  Thus  the  master  problem  (4.10),  confined  to  the 
linear  "piece”  containing  y..  becomes  the  linear  program 

t  r  t 

(4.14)  Maximize  b^y  +  (c^  -  y  D^)*!:^  . t .  (4.13),  i  ■  1,  ...,  £• 

y  °  i-1 

This  shows  that  Step  2  of  the  Piecewise  strategy  can  be  accomplished 
by  linear  programming.  Rosen  actually  works  with  the  dual  of  (4.14). 
His  Theorems  1  and  2  concern  Step  3  (cf.  the  discussion  following  (3.4) 
in  Sec.  3.1). 

It  is  interesting  to  note  that  if  we  had  started  with  the  dual 
of  (4.9) — a  Mock-diagonal  linear  program  with  coupling  constraints — 
we  would  have  obtained  precisely  the  sane  procedure  as  the  one  just 
described  by  dualizing  with  respect  to  the  coupling  constraints  only 
[Geoffrion  69]  and  then  invoking  the  Piecewise  strategy.  In  this  way 
(Rosen  64]  could  aleo  be  U3«d  to  illustrate  the  pattern  DUALIZATION/ 


PIECEWISE. 


% 


r 

i 

1 

!  -58- 

i 

4.3  DANTZIG-WOLFE  DECOMPOSITION 

Dantzig-Wolfe  Decomposition  is  archetypical  of  the  pattern 
INNER  LINEARIZATION/RESTRICTION.  Mechanized  pricing  plays  a  prominent 
role.  We  shall  illustrate  this  pattern  first  with  the  algorithm  of 
[Dantzig  and  Wolfe  60]  for  a  purely  linear  program,  then  with  the 
algorithm  of  [Dantzig  63a,  Ch.  24]  for  a  nonlinear  program,  and  finally 
with  a  variation  of  the  latter  in  which  not  all  nonlinear  functions 

« 

I 

,  need  be  inner-linear i zed . 

It  is  interesting  to  note  that  Dantzig-Wolfe  Decomposition  can 
also  be  viewed  as  an  instance  of  the  pattern  DUALIZATION,  OUTER 
LINEARIZATION /RELAXATION .  In  the  context  of  (4.15),  for  example,  one 
would  dualize  with  respect  to  the  constraints  Ax  £  b,  outer-linearize 

j 

i  the  resulting  minlmand  in  the  obvious  way,  and  then  apply  Relaxation. 

1 

f  [Dantzig  and  Wolfe  60] 

The  well-known  Dantzig-Wolfe  decomposition  approach  for  linear 
programs  will  be  explained  in  terms  of  the  linear  program 


<4. 15) 


Maximize  cCx  s.t.  Ax 
X«0 


Ax 


where  we  have  arbitrarily  divided  the  constraints  into  two  groups. 
With  the  definition 


(4.16)  X  £  [x  =  0  :  Ax  »  b}, 
we  may  wr*te  (4.15)  as 

(4.17)  Maximize  ctx  s.t.  Ax  *  b. 

xcX 


••if.*??:--  / 


-59- 


Stnce  X  is  a  convex  polytope,  we  know  (Th.  3)  that  it  admits  an  exact 
inner  linearization  using  oniy  a  finite  number  of  points.  Invoking 
this  representation  for  X,  we  obtain  a  master  linear  program  with  a  vast 
number  of  variables  to  which  Restriction  can  be  applied  in  the  form 
of  the  Simplex  Method.  It  turns  out  that  the  pricing  operation  (cf. 

Sec.  3.2)  can  be  accomplished  by  solving  a  linear  subproblem 
whose  feasible  region  is  X.  The  details  are  as  follows. 

Assume  that  X  is  not  empty  and  also,  for  ease  of  exposition  only, 
that  X  i3  bounded.  Then  X  can  be  represented  in  terms  of  its  extreme 
points  <x\  ...,  xP>,  and  (4.17)  can  be  written  as  the  equivalent  master 
linear  program 

t  P  1  P 

(4.18)  Maximize  c  (  £  a.xJ)  s.t.  L  or  *  1, 

or«0  j»l  J  j*l 

_  p  _ 

A(  Z  or.x-1)  =  b. 

>1  J 

The  Simplex  Method  for  this  problem  corresponds  to  Restriction  with 
respect  to  the  constraints  or^O.^  To  describe  how  the  pricing  opera¬ 
tion  can  be  mechanized,  we  shall  use  the  familiar  terminology  of 
linear  programming  rather  than  the  general  terminology  of  Restriction. 
The  optimality  conditions  at  the  general  iteration  are  u  »  0  and 

(4.19)  uq  +  uCAx*  -  cCx^  SO,  j  *  1 . p, 


Actually,  the  inequality  constraints  involving  A  are  also  normally 
considered  as  candidates  for  restriction  to  equality.  The  latter 


constraints  can  be  excluded,  if  desired,  from  the  candidates  for  re¬ 
striction  by  giving  u  £  0  priority  over  (4.19)  in  determining  the  en¬ 
tering  basic  variable.  Such  a  modification  is  necessary,  as  we  shall 
see  latex  in  this  subsection,  when  nonlinear  functions  are  inner- 
' inearl zed. 
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where  uq  and  the  vector  u  are  the  current  Simplex  multipliers. 
Condition  (4.19)  is  equivalent  to 

[u  4-  Minimum  ((utA  -  ct)x^  }]  2:  0 
° 

or,  since  <x\  ....  x^>  span  X,  to 


(4.20) 


[u  +  Min  (uCA  -  cC'/x]  2:  0. 
xeX 


The  linear  program  in  this  expression  is  valid  replacement  for  the 
finite  minimum  in  the  previous  expression  because  the  minimum  of  a 
linear  function  over  X  occurs  at  an  extreme  point.  Thus  we  see  how 
to  test  optimality  when  the  Simplex  Method  is  applied  to  (4.18).  If 
either  u  =  0  or  (4.20)  fails  to  hold,  a  profitable  nonbaslc  variable 
satisfying  the  usual  criterion  for  the  entering  variable  is  obtained 
automatically:  if  the  greatest  violation  occurs  in  u  *  0,  introduce 

the  corresponding  slack  variable;  if  in  (4.20),  introduce  the  vari¬ 
able  or,  ,  where  x  °  is  an  optimal  basic  feasible  solution  of  the 
Jo 

linear  program  in  (4.20)  (the  extremal  function  coefficient  of  a.  is 
.  j  Jo 

c  x  ,  and  the  technological  coefficient  column  is  unity  followed  by 

Ax  °) . 

Thus  there  is  no  difficulty  in  carrying  out  the  Simplex  Method 
applied  to  the  master  problem  (4.18).  Each  iteration  requires  solving 
the  linear  subproblem  in  (4.20).*  This  approach  may  possess  an  advan¬ 
tage  over  the  direct  application  of  the  Simplex  Method  to  (4.15)  when 


The  subproblem  need  be  solved  from  scratch  only  at  the  first 
Iteration;  thereafter,  restarting  or  parametric  techniques  can  be 
used  to  recover  an  optimum  as  u  changes  from  iteration  to  iteration. 
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the  subproblem  has  some  special  structure.  For  example,  if  (4.15)  is 
a  transportation  problem  with  additional  constraints,  then  the  sub¬ 
problem  becomes  a  pure  transportation  problem  if  A  is  taken  to  comprise 
tne  additional  constraints.  Another  example  is  the  case  in  which  A  is 
^lock-diagonal,  for  then  the  subproblem  separates  into  k  independent 
smaller  linear  programs.  In  general,  one  should  select  a  grouping  of 
the  constraints  (in  terms  of  A  and  A)  that  isolates  a  special  structure, 
and  then  exploit  this  structure  in  dealing  with  (4.20).  See  [Brolse, 
Huard  and  Sentenac  68],  [Orchard-Hays  68,  Sec.  10.4]  for  additional 
discussion  based  on  computational  experience. 

[Dantzlg  63a,  Ch.  24] 

Now  consider  a  nonlinear  version  of  (4.17),  namely 

(4.21)  Maximize  f(x)  s.t.  g^(x)  5  bi»  i  *  1,  ....  m, 
xeX 


where  X  is  a  convex  set,  f  is  concave  on  X,  and  is  convex  on  X. 

Dantzig  and  Wolfe’s  approach  [Oantzig  63a,  Ch.  24]  for  this  problem 

can  be  viewed  as  follows.  Let  f  and  each  g^  be  approximated  by 

i  2 

Inner  Linearization  over  an  arbitrarily  fine  base  <x  ,  x  ,  ...>  in 
X,  30  that  (4.21)  is  approximated  as  closely  as  desired  (in  princi¬ 
ple,  at  least)  by  the  linear  master  problem 


(4.22)  Maximize  £,  or  f(xJ)  s.t, 
or«0  3  3 


1, 


Ej  ®jgi(xj)  £  bt,  i  ■  1. 


m. 


We  say  "in  principle"  becausa  we  do  not  wish  to  actually  evaluate  f 
and  each  g^  at  every  point  in  the  base,  or  even  specify  the  base 
explicitly.  Hence  it  is  natural  f.o  solve  (4.22)  by  Restriction  with 
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the  constraints  c£0  as  the  candidates  for  restriction  to  equality 
(when  Oj  is  restricted  to  0,  the  values  f(x^)  and  g^(x^)  are  not 
needed).  A  very  natural  way  to  do  this  is  to  employ  the  Simplex 
Method  with  a  priority  convention  to  ensure  that  the  restricted  prob¬ 
lems  truly  optimized:  slack  variables  corresponding  to  the 
constraints  must  be  given  priority  over  structural  variables  in  deter¬ 
mining  which  variable  Is  to  enter  a  basis.  Any  feasible  solution  of 
(4.21)  can  be  used  to  find  an  initial  basic  feasible  solution,  and  at 
the  general  Iteration  the  optimality  criterion  or  pricing  problem  is 
(cf.  (4.19))  u±  i  0  (1  <  i  <  o)  and 


(4.23) 


u  -£  U.g..(xj)  -  f(xJ)  *  0.  all  j, 
°  i«  1  1  " 


where  u  ,  u, ,  ...,  u  are  the  current  Simplex  multipliers.  By  the 
o  l  m 

priority  convention,  we  may  assume  that  ^  i  0  (1  si  Sm).  Note  that 
(4.23)  is  intimately  related  (cf.  (4.20))  to  the  convex  subproblem 


(4.24) 


Minimize  u,g.(x)  -  i  ). 
xtX  i*l  1 


If  uq  plus  the  optimal  value  of  this  problem  is  nonnegative,  then 
(4.23)  holds  and  an  optimal  solution  of  (4.21)  is  at  hand  [x*  * 
or^x^,  where  or  is  the  current  and  optimal  solution  of  (4.22)]; 
otherwise,  an  optimal  or  near-optimal  solution  x  of  (4.24)  can  be 
profitably  added  to  the  current  explicit  base  by  introducing  the 
corresponding  a.  into  the  basis  in  the  usual  way  after  evaluating 

J 

f(x)  and  g^(x).  In  practice,  termination  would  take  place  as  soon  as 
the  value  of  the  current  approximation  to  an  optimal  solution  of 
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(4.2i)-~the  quantity  f{£.  cr^) --approaches  closely  enough  the 
following  easily  demonstrated  upper  bound  for  v.he  true  optimal  value: 


(4*25) 


m 


i-1 


“ibi 


Min 


ex[£  -  £M 

L  i“l  J 


} 

i 

? 

i 

! 


This  approach  is  particularly  attractive  when  the  structure  is 
such  that  (4.24)  is  relatively  tractable  by  comparison  with  (4.21); 
for  example,  when  X  is  an  open  set  and  f  and  g^  are  differentiable, 
or  when  (4.24)  is  separable  into  several  indepena*.  r  subproblems. 


A  Variant 


It  is  interesting  to  observe  that  Inner  Linearization  need  not 
be  applied  to  all  nonlinear  functions  of  (4.21).^  An  advantage  can 
sometimes  be  gained  by  inner- linearizing  only  a  subset  of  the  non¬ 
linear  functions,  say  g  ,  g  <ta,  <m).  Then  instead  of  (4.22) 

i  l 

we  have  the  concave  master  problem 


(4.26)  Maximize  f(E  cr  x^) 

era  J 

s.t.  Ej  Oj  -  1, 

Sj  Qri«i(xj)  «  bj,,  i  -  1 . mt 

gi(Ej  arxj)  sbi,  i  -  nij  +  1, 

Again  we  wish  to  apply  Restriction  with  only  the  nonnegativity  con¬ 
straints  o«0  as  candidates  for  restriction  to  equality.  The  Simplex 
Method  can  no  longer  be  adapted  to  this  purpose,  however,  since  (4.26) 
is  not  a  linear  program.  Implementation  requires  a  concave  programming 

In  [Whins ton  66],  for  example,  the  objective  function  of  a  block- 
diagonal  quadratic  program  with  coupling  constraints  is  not  inner- 
linearized . 


i 


L 
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algorithm  for  solving  the  restricted  versions  of  (4.26)  and  also  a 
means  of  mechanizing  the  pricing  operation.  We  need  not  discuss  the 
first  requirement.  The  second  involves  being  Able  to  determine  the 

g 

prices  for  all  j  in  S,  where  S  is  the  current  set  of  indices  for 
which  is  restricted  to  value  0.  This  can  be  done  as  follows  [Holloway 
69]-  Let  aS  be  the  optimal  solution  to  (4.26)  with  the  additional  re- 

S  S  8 

strictions  a.  ■  0  for  jeS,  and  let  u  ,  . u  be  the  associated 

optimal  multipliers  (which  must  exist  if  a  constraint  qualification  is 
satisfied).  Then,  assuming  all  functions  are  continuously  differen- 
tiable,  the  price  associated  with  *  0  is  given  for  all  jeS  by 


It  follows  that  the  pricing  problem  can  be  solved  by  optimizing  the 
convex  (u8  -  0)  subproblem 

ra, 

1  m 

(4.29)  Minimize  -Vf(x8)x  +  £  u®g  (x)  +  V  u®  Vg.(x8)x. 
xeX  i-1  i-nj+l 

Compare  with  (4.24).  If  f  were  inner- linearized  too,  the  first  term 
of  the  maximand  of  (4.29)  would  be  -f(x). 

Which  of  all  given  constraints  should  be  incorporated  into  X> 
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and  which  of  the  remainder  and  whether  f  itself  should  be  inner-linear- 
ized,  depends  mainly  on  the  availability  of  efficient  algorithms  for  the 
resulting  versions  of  (4.29)  and  (4.26)  with  a  »  0  for  jeS. 

4.4  [Giiof frion68b,  Sec.  4] 

A  qnite  general  problem  with  multidivisional  structure  is 


tv 

(4.30)  Maximize  E  f.(x.) 

2  _  1  *  * 


i«l 


[.t.  H^(Xi)  »  0,  i  *  1,  ...,  k 

k 

E  g.(x  >  £  b, 

i*i 


where  f.^,  h^  and  g  are  ail  concave  differentiable  functions  of  the 
vector  x±.  The  subscript  i  can  be  thought  of  as  indexing  the  individual 
divisions,  which  are  linked  together  only  by  coupling  constraints.  The 
approach  of  [Geoff rion  68b,  Sec.  4]  is  an  application  of  the  pattern 
PROJECTION/FEASIBLE  DIRECTIONS,  The  optimization  of  (4,30)  is  carried 
out  largely  at  the  divisional  level  subject  to  central  coordination. 

First  (4.30)  is  projected  onto  the  space  of  its  coupling  con¬ 
straints.  This  requires  introducing  the  vectors  y, ,  ....  y.  : 

1  'k 


(4.31) 


Maximize  E  f^(x  ) 
x,y  i=l  1 


Hi(xi)  £  0,  i  *  1,  . . . ,  k 
Ol<xl)  *  yL,  i  *  1,  . .. ,  k 

E  »  b. 

i=i  1 


s.  t. 
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In  effect,  this  changes  the  given  problem  from  one  with  coupling  con¬ 
straints  to  one  with  coupling  variables,  since  (4.31)  separates  into 
k  separate  problems  if  y  is  held  fixed  temporarily.  One  may  interpret 
y^  as  a  vector  of  resources  and  tasks  assigned  to  the  i*^  division. 
Projection  of  this  problem  onto  y  yields  the  master  problem 


(4.32) 


k 

Maximize  v  (y.)  s.t. 
y  i=l  1  1 


where  v^  is  defined  as  the  supremal  value  of  the  parameterized 
divisional  problem 


(4.33) 


Maximize  f^(xp  s.t.  H^(Xj) 

W 


Now  we  wish  to  apply  the  Feasible  Directions  strategy  to  (4.32). 

The  idea  of  this  strategy,  it  will  be  recalled,  is  to  generate  an 
improving  sequence  of  feasible  points,  with  each  new  point  determined 
from  the  previous  one  by  selecting  an  improving  feasible  direction 
and  then  maximizing  along  a  line  emanating  in  this  direction.  The 
latter  maximization  is  only  one-dimensional,  and  can  easily  be  es¬ 
sentially  decentralized  to  the  divisional  level.  The  chief  difficulty 

with  this  strategy  concerns  how  to  find  a  good  improving  feasible 

k 

direction,  for  the  roaximand  v  (y  )  is  not  everywhere  differentiable 

i-1  1  1 

and  is  available  only  implicitly  in  terms  of  the  divlsi  nal  problems 

(4.33).  It  can  nevertheless  be  shown  [ibid..  Sec.  4.2],  using  the  theory 
of  8ubgradients  for  concave  functions  and  the  optimality  conditions  as¬ 
sociated  with  (4.33),  that  the  following  explicit  linear  program  yields 
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an  improving  feasible  direction  z°  for  (4,32)  at  a  feasible  point 

y°:  moreover,  z°  is  best  among  all  feasible  directions  in  that  if 

k 

maximizes  thr  initial  rate  of  improvement  of  v  (y  ): 

i=l  L  L 


(4,34) 


A  o 

Maximize  >,  tff.w. 

i  i 

w,z  i=i 


s.t,  Vg  w  -  ft  2  0,  i  «  1,  ....  k 

11  1  13  i  such  that  g°.  -  y°} 

Vh°  v  £  0,  i  =  1,  . . .  ,  k 

1  j  such  that  h?.  =  0 

ij 

k  k 

J]  z  i  C,  j  such  that  Y)  y?;  *  b, 
i*l  J  i=i  J 


-1  S  z. .  si,  all  i  and  j. 
ij 


Here  vg?,  refers  to  a  row  vector  that  is  the  gradient  of  g  evaluated 

J  1  j 

at  an  optimal  solution  of  (4.33)  with  y^  *  y°,  and  the  other  super¬ 
scripted  quantities  have  similar  definitions.  The  vector  w^  has  the 
same  dimension  as  x^.  This  subpmblem  enables  the  Feasible  Directions 
strategy  for  (4.32)  to  be  carried  out. 


4.5  [Takahashi  641 
Consider 


(4.35)  Maximize  f(x)  s.t.  H(x)  *  0 

x 

G(x)  =  0, 

where  f  is  concave  and  all  constraints  are  linear.  Suppose  that  the 
G  constraints  are  complicating  in  the  sense  that  the  problem  would 
be  much  easier  if  they  were  not  present.  For  instance,  the  complicating 


constraints  may  be  the  coupling  constraints  of  a  structure  similar  to 
the  one  in  the  previous  subsection,  or  they  may  spoil  what  would  other¬ 
wise  be  a  special  structure  for  which  efficient  solution  methods  would 
be  available.  The  pattern  of  the  ‘'local"  approach  of  [Takahashi  64] 
for  this  problem  is  DUALIZATION/ FEASIBLE  DIRECTIONS. 

The  dual  of  (4.35)  with  respect  to  the  complicating  constraints 
only  yields  (see,  e.g.,  [Rockafellar  68]  or  [Geoffrion  69])  the  following 
problem  in  the  space  of  the  dual  variables  X  (a  vector  whose  dimension 
matches  G) : 

(4.36)  Minimize  v(X), 

X 

where  v(X)  is  defined  as  the  supreroal  value  of  the  parameterized 
problem 

(4.37)  Maximize  f(x)  +  Xt’G(x)  s.t.  H(x)  *  0. 

x 

Note  that  (4.37)  is  of  the  same  form  as  (4.35)  except  the  complicating 
constraints  are  now  part  of  the  criterion  function. 

To  apply  the  Feasible  Directions  strategy  to  (4.36),  we  must  be 
able  to  identify  an  improving  feasible  direction.  Any  direction  is 
feasible,  of  course,  since  X  is  unconstrained.  When  f  is  strictly 
concave,  it  can  be  shown  that  v  is  differentiable.  Its  gradient  at 
a  point  X°  is  simply  G(x°) ,  where  x°  is  the  optimal  solution  of 

(4.37)  with  X  *  X°.  Hence  the  Feasible  Directions  strategy  can  be 
carried  out  for  (4.36)  using  the  negative  of  the  gradient  of  v 
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as  the  improving  feasible  direction.  Actually,  Takahashi  proposes 
a  shore-step  method  rather  than  requiring  a  one-dimensional  minimi¬ 
zation  to  be  performed  in  order  to  determine  step  size.  The  pro¬ 
cedure  may  be  summarized  as  follows. 

1.  Choose  a  starting  point  A°. 

2.  Solve  (4.37)  with  A  *  A°  for  its  optimal  solution  x°.  If 
G(x°)  *  0,  then  x°  is  optima?,  in  (4.35);  otherwise,  go  on 
to  Step  3. 

3.  Let  -»  A°  -  £G(x°),  where  £  is  a  small  positive  constant, 
and  return  to  Step  2  with  A‘  in  place  of  A°. 


5.  CONCLUSION 


We  have  attempted  to  develop  a  framework  of  unifying  concepts  that 
comprehends  much  of  the  literature  on  large-scale  mathematical  program¬ 
ming,  If  we  have  been  successful,  the  non-specialist  should  have  nn 
overview  of  the  *ield  that  facilitates  further  study,  and  the  advanced 
reader  should  feel  that  he  has  a  deeper  understanding  of  previously 
familiar  algorithms  and  that  he  perceives  new  commonalities  among  ap¬ 
proaches  that  heretofore  seemed  to  be  related  only  vaguely  if  at  all- 

in  addition,  we  hope  that  the  framework  will  suggest  a  variety  of 
worthwhile  topics  for  investigation-  The  problem  manipulations  and 
solution  strategies  discussed  here  all  invite  further  study,  and  others 
should  be  added  to  the  fold  so  that  additional  algorithms  can  be  en¬ 
compassed.  The  algorithms  falling  within  the  purview  of  each  particular 
manipulation/strategy  pattern  (cf.  Table  2)  should  be  studied  carefully 
in  relation  to  one  another,  with  the  aim  of  learning  how  "best"  to  use 
the  tactical  options  of  the  pattern  and  organize  the  computations  for 
various  classes  of  problems. 

The  relationships  between  ostensibly  different  patterns  also  war¬ 
rant  further  study.  We  mentioned  in  Sec.  3.3  that  Restriction  (Relax¬ 
ation)  is  essentially  equivalent  to  Dualization  followed  by  Relaxation 
(Restriction),  and  other  equivalences  were  briefly  noted  in  Secs.  4.2 
and  4.3,  Many  others  exist;  for  example,  it  has  often  been  observed 
that  Dantzig-Wolf e  and  Benders  Decomposition  are  dual  to  one  another 
in  an  appropriate  sense.  The  results  of  [Zoutendijk  60;  Secs.  9.4, 

10.3,  11.4]  are  in  this  spirit,  even  if  they  do  not  specifically 
involve  algorithms  for  large-scale  programing.  Knowledge  of  such 
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relations  reduces  the  number  of  essentially  different  patterns  to  be 
considered,  and  enables  meaningful  comparisons  among  the  remainder. 

Investigations  along  these  lines  should  help  civilize  the  jungle 
of  extant  algorithms  and  pave  the  way  for  truly  significant  computa¬ 
tional  studies. 
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